App_api Fatal Error

I’ve been having consistent issues attempting to process my latest dataset during refinement, I’m working with ~3 million particles and the local refinement keeps crashing after half a day to two days into the run on my single workstation whereby I’m trying to determine the cause.

The workstation is running centos 7 with cuda 11.1 and the latest (4.2.1) version of cryosparc on it.

The computer screen freezes and am able to access via ssh which gives a cryosparc status of database exited and app_api fatal error as seen below. Upon restarting the computer and clearing .sock files in the tmp folder to startup cryosparc the failed jobs are indicated as “Job is unresponsive - no heartbeat received in 60 seconds.”

[cryosparc_user@c110294 ~]$ cryosparcm status

CryoSPARC System master node installed at
/home/cryosparc_user/cryosparc/cryosparc_master
Current cryoSPARC version: v4.2.1

CryoSPARC process status:
app RUNNING pid 4917, uptime 1 day, 22:47:50
app_api FATAL unknown error making dispatchers for ‘app_api’: EROFS
app_api_dev STOPPED Not started
app_legacy STOPPED Not started
app_legacy_dev STOPPED Not started
command_core RUNNING pid 4780, uptime 1 day, 22:48:07
command_rtp RUNNING pid 4846, uptime 1 day, 22:47:55
command_vis RUNNING pid 4827, uptime 1 day, 22:47:57
database EXITED Jul 12 09:13 PM

License is valid

global config variables:
export CRYOSPARC_LICENSE_ID=“xxxx”
export CRYOSPARC_MASTER_HOSTNAME=“c110294”
export CRYOSPARC_DB_PATH=“/home/cryosparc_user/cryosparc/cryosparc_database”
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
export CRYOSPARC_CLICK_WRAP=true

App_api_log has the below error repeating

Exception in setInterval callback: MongoServerSelectionError: connect ECONNREFUSED 127.0.0.1:39001
at Timeout._onTimeout (/home/cryosparc_user/cryosparc/cryosparc_master/cryosparc_app/api/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/sdam/topology.js:312:38)
at listOnTimeout (internal/timers.js:557:17)
at processTimers (internal/timers.js:500:7) {
reason: TopologyDescription {
type: ‘ReplicaSetNoPrimary’,
servers: Map(1) { ‘localhost:39001’ => [ServerDescription] },
stale: false,
compatible: true,
heartbeatFrequencyMS: 10000,
localThresholdMS: 15,
setName: ‘meteor’,
maxSetVersion: 1,
maxElectionId: new ObjectId(“7fffffff00000000000000a9”),
commonWireVersion: 6,
logicalSessionTimeoutMinutes: undefined
}
}

On restart

[cryosparc_user@c110294 ~]$ ps xww | grep -e cryosparc -e mongo
4560 ? Ss 0:07 python /home/cryosparc_user/cryosparc/cryosparc_mast er/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /home/cryosparc_us er/cryosparc/cryosparc_master/supervisord.conf
4679 ? Sl 13:51 mongod --auth --dbpath /home/cryosparc_user/cryospar c/cryosparc_database --port 39001 --oplogSize 64 --replSet meteor --nojournal – wiredTigerCacheSizeGB 4 --bind_ip_all
4791 ? Sl 3:12 python -c import cryosparc_command.command_core as s erv; serv.start(port=39002)
4844 ? Sl 0:34 python -c import cryosparc_command.command_vis as se rv; serv.start(port=39003)
4861 ? Sl 2:26 python -c import cryosparc_command.command_rtp as se rv; serv.start(port=39005)
4950 ? Sl 3:47 /home/cryosparc_user/cryosparc/cryosparc_master/cryo sparc_app/api/nodejs/bin/node ./bundle/main.js
6119 ? S 0:00 bash /home/cryosparc_user/cryosparc/cryosparc_worker /bin/cryosparcw run --project P25 --job J302 --master_hostname c110294 --master_ command_core_port 39002
6141 ? Sl 0:44 python -c import cryosparc_compute.run as run; run.r un() --project P25 --job J302 --master_hostname c110294 --master_command_core_po rt 39002
6142 ? Sl 495:59 python -c import cryosparc_compute.run as run; run.r un() --project P25 --job J302 --master_hostname c110294 --master_command_core_po rt 39002
12159 ? S 0:00 sshd: cryosparc_user@pts/1
12531 pts/1 S+ 0:00 grep --color=auto -e cryosparc -e mongo
0

Welcome to the forum @T_Bird.
Please can you post any recent errors you find inside

/home/cryosparc_user/cryosparc/cryosparc_master/run/database.log

You can browse database.log with the command
cryosparcm log database
.

From the database log the following error occurred about a day before the crash

2023-07-11T10:53:01.840-0700 I NETWORK [conn39] Error receiving request from client: ProtocolError: Client sent an HTTP request over a native MongoDB connection. Ending connection from 127.0.0.1:46320 (connection id: 39)

Database log after error up to startup

2023-07-11T10:53:01.841-0700 I NETWORK [conn39] end connection 127.0.0.1:46320 (29 connections now open)
2023-07-11T11:58:05.856-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 33ms
2023-07-11T12:23:06.515-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 36ms
2023-07-11T12:57:18.758-0700 I NETWORK [listener] connection accepted from 127.0.0.1:52354 #40 (30 connections now open)
2023-07-11T12:57:18.767-0700 I NETWORK [conn40] received client metadata from 127.0.0.1:52354 conn40: { driver: { name: “nodejs”, version: “4.3.1” }, os: { type: “Linux”, name: “linux”, architecture: “x64”, version: “3.10.0-1127.19.1.el7.x86_64” }, platform: “Node.js v14.19.3, LE (unified)|Node.js v14.19.3, LE (unified)” }
2023-07-11T12:57:18.791-0700 I ACCESS [conn40] Successfully authenticated as principal cryosparc_user on admin from client 127.0.0.1:52354
2023-07-11T13:06:42.009-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 37ms
2023-07-11T16:22:43.690-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 39ms
2023-07-11T16:52:15.177-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 30ms
2023-07-11T20:26:45.857-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 64ms
2023-07-11T21:34:49.229-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 18ms
2023-07-12T00:12:58.420-0700 I COMMAND [LogicalSessionCacheRefresh] command config.$cmd command: update { update: “system.sessions”, ordered: false, allowImplicitCollectionCreation: false, writeConcern: { w: “majority”, wtimeout: 15000 }, $db: “config” } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 14, w: 14 } }, Database: { acquireCount: { w: 14 } }, Collection: { acquireCount: { w: 7 } }, oplog: { acquireCount: { w: 7 } } } protocol:op_msg 186ms
2023-07-12T00:24:34.867-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 133ms
2023-07-12T01:54:56.385-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 141ms
2023-07-12T03:53:31.748-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 53ms
2023-07-12T06:02:57.162-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 14ms
2023-07-12T07:02:58.333-0700 I COMMAND [LogicalSessionCacheRefresh] command config.$cmd command: update { update: “system.sessions”, ordered: false, allowImplicitCollectionCreation: false, writeConcern: { w: “majority”, wtimeout: 15000 }, $db: “config” } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 16, w: 16 } }, Database: { acquireCount: { w: 16 } }, Collection: { acquireCount: { w: 8 } }, oplog: { acquireCount: { w: 8 } } } protocol:op_msg 109ms
2023-07-12T07:17:03.998-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 12ms
2023-07-12T09:34:14.893-0700 I NETWORK [listener] connection accepted from 127.0.0.1:55914 #41 (31 connections now open)
2023-07-12T09:34:14.900-0700 I NETWORK [conn41] received client metadata from 127.0.0.1:55914 conn41: { driver: { name: “nodejs”, version: “4.3.1” }, os: { type: “Linux”, name: “linux”, architecture: “x64”, version: “3.10.0-1127.19.1.el7.x86_64” }, platform: “Node.js v14.19.3, LE (unified)|Node.js v14.19.3, LE (unified)” }
2023-07-12T09:34:14.919-0700 I ACCESS [conn41] Successfully authenticated as principal cryosparc_user on admin from client 127.0.0.1:55914
2023-07-12T10:47:18.276-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 84ms
2023-07-12T11:42:58.503-0700 I COMMAND [LogicalSessionCacheRefresh] command config.$cmd command: update { update: “system.sessions”, ordered: false, allowImplicitCollectionCreation: false, writeConcern: { w: “majority”, wtimeout: 15000 }, $db: “config” } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 14, w: 14 } }, Database: { acquireCount: { w: 14 } }, Collection: { acquireCount: { w: 7 } }, oplog: { acquireCount: { w: 7 } } } protocol:op_msg 280ms
2023-07-12T11:44:57.016-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 14ms
2023-07-12T12:17:58.338-0700 I COMMAND [LogicalSessionCacheRefresh] command config.$cmd command: update { update: “system.sessions”, ordered: false, allowImplicitCollectionCreation: false, writeConcern: { w: “majority”, wtimeout: 15000 }, $db: “config” } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 14, w: 14 } }, Database: { acquireCount: { w: 14 } }, Collection: { acquireCount: { w: 7 } }, oplog: { acquireCount: { w: 7 } } } protocol:op_msg 119ms
2023-07-12T14:47:58.369-0700 I COMMAND [LogicalSessionCacheRefresh] command config.$cmd command: update { update: “system.sessions”, ordered: false, allowImplicitCollectionCreation: false, writeConcern: { w: “majority”, wtimeout: 15000 }, $db: “config” } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 16, w: 16 } }, Database: { acquireCount: { w: 16 } }, Collection: { acquireCount: { w: 8 } }, oplog: { acquireCount: { w: 8 } } } protocol:op_msg 141ms
2023-07-12T15:20:24.887-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 156ms
2023-07-12T16:27:48.871-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 11ms
2023-07-12T20:04:06.527-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 118ms
2023-07-12T21:11:11.211-0700 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 22ms
2023-07-13T09:10:50.913-0700 I CONTROL [initandlisten] MongoDB starting : pid=4594 port=39001 dbpath=/home/cryosparc_user/cryosparc/cryosparc_database 64-bit host=c110294

@T_Bird
Please can you post

  1. the output of the command
    cryosparcm cli "get_scheduler_targets()"
    
  2. for a local refinement job that has crashed, but not subsequently been cleared or re-run, the text of the job log:
    cryosparcm joblog <project_uid> <job_uid>
    
  3. next time the “computer screen freezes”, the outputs of the commands
    free -g
    df -h /home/cryosparc_user/cryosparc/cryosparc_database
    du -sh /home/cryosparc_user/cryosparc/cryosparc_database
    
  1. cryosparcm cli “get_scheduler_targets()”

[cryosparc_user@c110294 ~]$ cryosparcm cli “get_scheduler_targets()”
[{‘cache_path’: ‘/scr/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 25446776832, ‘name’: ‘GeForce RTX 3090’}, {‘id’: 1, ‘mem’: 25447170048, ‘name’: ‘GeForce RTX 3090’}], ‘hostname’: ‘c110294’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘c110294’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]}, ‘ssh_str’: ‘cryosparc_user@c110294’, ‘title’: ‘Worker node c110294’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/cryosparc_user/cryosparc/cryosparc_worker/bin/cryosparcw’}]

  1. joblog error at end repeats

Running job J303 of type new_local_refine
Running job on hostname %s c110294
Allocated Resources : {‘fixed’: {‘SSD’: True}, ‘hostname’: ‘c110294’, ‘lane’: ‘default’, ‘lane_type’: ‘node’, ‘license’: True, ‘licenses_acquired’: 1, ‘slots’: {‘CPU’: [4, 5, 6, 7], ‘GPU’: [1], ‘RAM’: [3, 4, 5]}, ‘target’: {‘cache_path’: ‘/scr/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 25446776832, ‘name’: ‘GeForce RTX 3090’}, {‘id’: 1, ‘mem’: 25447170048, ‘name’: ‘GeForce RTX 3090’}], ‘hostname’: ‘c110294’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘c110294’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]}, ‘ssh_str’: ‘cryosparc_user@c110294’, ‘title’: ‘Worker node c110294’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/cryosparc_user/cryosparc/cryosparc_worker/bin/cryosparcw’}}
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
**custom thread exception hook caught something
**** handle exception rc
**custom thread exception hook caught something
**** handle exception rc
========= sending heartbeat at 2023-07-12 21:14:03.051551
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:553: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:653: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
x = n.linalg.lstsq(w.reshape((-1,1))A, wb)[0]
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:553: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:29: RuntimeWarning: invalid value encountered in sqrt
cradwn = n.sqrt(cradwn)
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:1061: RuntimeWarning: invalid value encountered in arcsin
viewdirs_elevation = n.arcsin( viewdirs[:, 2])
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:653: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
x = n.linalg.lstsq(w.reshape((-1,1))A, wb)[0]
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:553: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:29: RuntimeWarning: invalid value encountered in sqrt
cradwn = n.sqrt(cradwn)
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:1061: RuntimeWarning: invalid value encountered in arcsin
viewdirs_elevation = n.arcsin( viewdirs[:, 2])
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:653: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
x = n.linalg.lstsq(w.reshape((-1,1))A, wb)[0]
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:553: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:29: RuntimeWarning: invalid value encountered in sqrt
cradwn = n.sqrt(cradwn)
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:1061: RuntimeWarning: invalid value encountered in arcsin
viewdirs_elevation = n.arcsin( viewdirs[:, 2])
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:653: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
x = n.linalg.lstsq(w.reshape((-1,1))A, wb)[0]
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:553: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:29: RuntimeWarning: invalid value encountered in sqrt
cradwn = n.sqrt(cradwn)
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:1061: RuntimeWarning: invalid value encountered in arcsin
viewdirs_elevation = n.arcsin( viewdirs[:, 2])
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:653: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
x = n.linalg.lstsq(w.reshape((-1,1))A, wb)[0]
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:553: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:29: RuntimeWarning: invalid value encountered in sqrt
cradwn = n.sqrt(cradwn)
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:1061: RuntimeWarning: invalid value encountered in arcsin
viewdirs_elevation = n.arcsin( viewdirs[:, 2])
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:653: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
x = n.linalg.lstsq(w.reshape((-1,1))A, wb)[0]
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:553: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:29: RuntimeWarning: invalid value encountered in sqrt
cradwn = n.sqrt(cradwn)
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:1061: RuntimeWarning: invalid value encountered in arcsin
viewdirs_elevation = n.arcsin( viewdirs[:, 2])
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:653: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
x = n.linalg.lstsq(w.reshape((-1,1))A, wb)[0]
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:553: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:29: RuntimeWarning: invalid value encountered in sqrt
cradwn = n.sqrt(cradwn)
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:1061: RuntimeWarning: invalid value encountered in arcsin
viewdirs_elevation = n.arcsin( viewdirs[:, 2])
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:653: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
x = n.linalg.lstsq(w.reshape((-1,1))A, wb)[0]
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:553: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:29: RuntimeWarning: invalid value encountered in sqrt
cradwn = n.sqrt(cradwn)
Exception in thread Thread
Traceback (most recent call last):
File “/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 2061, in run_with_except_hook
run_old(*args, **kw)
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/threading.py”, line 870, in run
self._target(*self._args, **self._kwargs)
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 2441, in cryosparc_compute.engine.newengine.process.work
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 2589, in cryosparc_compute.engine.newengine.process.work
File “cryosparc_master/cryosparc_compute/jobs/local_refine/newrun.py”, line 582, in cryosparc_compute.jobs.local_refine.newrun.run_local_refine.progress
File “/home/cryosparc_user/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 1606, in update_event_text
db[‘events’].update_one({‘_id’:event_id},
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/collection.py”, line 1132, in update_one
self._update_retryable(
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/collection.py”, line 961, in _update_retryable
return self.__database.client._retryable_write(
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/mongo_client.py”, line 1644, in _retryable_write
return self._retry_with_session(retryable, func, s, None)
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/mongo_client.py”, line 1532, in _retry_with_session
return self._retry_internal(retryable, func, session, bulk)
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/mongo_client.py”, line 1571, in _retry_internal
raise last_error
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/mongo_client.py”, line 1557, in _retry_internal
with self._get_socket(server, session) as sock_info:
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/contextlib.py”, line 113, in enter
return next(self.gen)
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/mongo_client.py”, line 1396, in _get_socket
with server.get_socket(self.__all_credentials, handler=err_handler) as sock_info:
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/contextlib.py”, line 113, in enter
return next(self.gen)
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/pool.py”, line 1371, in get_socket
sock_info = self._get_socket(all_credentials)
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/pool.py”, line 1436, in _get_socket
sock_info = self.connect(all_credentials)
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/pool.py”, line 1327, in connect
_raise_connection_failure(self.address, error)
File “/home/cryosparc_user/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pymongo/pool.py”, line 266, in _raise_connection_failure
raise AutoReconnect(msg)
pymongo.errors.AutoReconnect: c110294:39001: [Errno 111] Connection refused

  1. Screen currently frozen, local refinement finished after ~60 hours without crashing, the screen displays image and cursor moves yet unable to interact with mouse or get any keyboard shortcuts to work.

[cryosparc_user@c110294 ~]$ free -g

Mem: total (376) used(61) free (3) shared (6) buff/cache (310) available (307)
Swap: total (9) used (8) free (1)
[cryosparc_user@c110294 ~]$ df -h /home/cryosparc_user/cryosparc/cryosparc_database

Filesystem Size Used Avail Use% Mounted on
/dev/sda3 870G 419G 407G 51% /
[cryosparc_user@c110294 ~]$ du -sh /home/cryosparc_user/cryosparc/cryosparc_database
64G /home/cryosparc_user/cryosparc/cryosparc_database

Thanks @T_Bird.
What is the output of the command
host c110294
?
What is the full URL you use in the browser address bar to connect to the CryoSPARC UI?

[cryosparc_user@c110294 ~] host c110294
Host c110294 not found: 3(NXDOMAIN)
[cryosparc_user@c110294 ~]$ dig c110294

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.7 <<>> c110294
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 49752
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;c110294. IN A

;; AUTHORITY SECTION:
. 3366 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2023071801 1800 900 604800 86400

;; Query time: 0 msec
;; SERVER: x.x.x.x#53(x.x.x.x)
;; WHEN: Tue Jul 18 14:16:43 PDT 2023
;; MSG SIZE rcvd: 111

Browser URL:
http://localhost:39000/

It seems your $CRYOSPARC_MASTER_HOSTNAME cannot be resolved. I am not sure what intervention would be appropriate in your specific network environment.
If your /etc/hosts file does not already contain a line starting with 127.0.1.1, you could try adding adding the line
127.0.1.1 c110294 to /etc/hosts. Please confer with your network admins for the correct method given your needs and network environment.

/etc/hosts doesn’t have 127.0.1.1, will confer with the admin on the next restart and update

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

Thanks for the help @wtempel

Used the computer for a month since adding in the “127.0.1.1 c110294” to the /etc/hosts file and helped a fair bit, the computer screen doesn’t freeze up when running jobs and am now able to use the computer in a direct manner. Still get some database exiting issues but aren’t API association and are linked to HDD problems.