Hi,
we have a standalone cryosparc server running multiple cryosparc instances (due to access policies / user permissions set by institute IT). In the past this has worked fine without any major issues. Recently instances have been crashing frequently with the webapp either displaying an infinite loading bar or a 503 error.
Cryosparc (via cryosparcm status) will typically give a status similar to the following:
Current cryoSPARC version: v3.3.2
----------------------------------------------------------------------------
CryoSPARC process status:
app RUNNING pid 147261, uptime 23:00:53
app_dev STOPPED Not started
command_core RUNNING pid 31148, uptime 0:44:04
command_rtp RUNNING pid 147166, uptime 23:01:03
command_vis RUNNING pid 64164, uptime 0:00:20
database EXITED May 09 11:29 PM
liveapp STOPPED Not started
liveapp_dev STOPPED Not started
webapp RUNNING pid 147250, uptime 23:00:54
webapp_dev STOPPED Not started
----------------------------------------------------------------------------
License is valid
----------------------------------------------------------------------------
or seemingly less frequently something like this:
Current cryoSPARC version: v3.3.2
----------------------------------------------------------------------------
CryoSPARC process status:
app FATAL Exited too quickly (process log may have details)
app_dev STOPPED Not started
command_core RUNNING pid 127735, uptime 0:41:19
command_rtp RUNNING pid 127456, uptime 0:41:45
command_vis RUNNING pid 127450, uptime 0:41:47
database EXITED May 09 10:52 AM
liveapp STOPPED Not started
liveapp_dev STOPPED Not started
webapp RUNNING pid 127580, uptime 0:41:37
webapp_dev STOPPED Not started
Prior to the last crash last night, I got the following in the database.log file:
2022-05-09T23:01:31.589+0200 I COMMAND [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after dur: 0, after extra_info: 1592, after globalLock: 1592, after locks: 1592, after network: 1592, after opLatencies: 1592, after opcounters: 1592, after
opcountersRepl: 1592, after repl: 1592, after storageEngine: 1592, after tcmalloc: 1592, after wiredTiger: 1592, at end: 1592 }
2022-05-09T23:29:07.373+0200 F - [conn12] Invalid access at address: 0x558b1f76bedd
2022-05-09T23:29:07.400+0200 F - [conn12] Got signal: 7 (Bus error).
0x558b20172ac1 0x558b20171cd9 0x558b20172346 0x7f2366ed9630 0x558b1f76bedd 0x558b1f76c0c8 0x558b1f75d47c 0x558b1f75d4c8 0x558b1f75d508 0x558b1f75d597 0x558b1f780a70 0x558b1f78157a 0x558b1f781bbb 0x558b1f781cbc 0x558b1f6bd0c5 0x558b1f69405f 0x558b1f695741 0x558b1fcaa720 0x558b1f8ae8b2 0x558b1f8b08b6 0x558b1f4b1c7d 0x558b1f4b25ad 0x558b200f2b31 0x7f2366ed1ea5 0x7f2366bfa9fd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"558B1EC3F000","o":"1533AC1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"558B1EC3F000","o":"1532CD9"},{"b":"558B1EC3F000","o":"1533346"},{"b":"7F2366ECA000","o":"F630"},{"b":"558B1EC3F000","o":"B2CEDD","s":"_ZN5mongo10LockerImplILb0EE9lockBeginENS_10ResourceIdENS_8LockModeE"},{"b":"558B1EC3F000","o":"B2D0C8","s":"_ZN5mongo10LockerImplILb0EE16_lockGlobalBeginENS_8LockModeENS_8DurationISt5ratioILl1ELl1000EEEE"},{"b":"558B1EC3F000","o":"B1E47C","s":"_ZN5mongo4Lock10GlobalLock8_enqueueENS_8LockModeEj"},{"b":"558B1EC3F000","o":"B1E4C8","s":"_ZN5mongo4Lock10GlobalLockC1EPNS_6LockerENS_8LockModeEjNS1_11EnqueueOnlyE"},{"b":"558B1EC3F000","o":"B1E508","s":"_ZN5mongo4Lock10GlobalLockC2EPNS_6LockerENS_8LockModeEj"},{"b":"558B1EC3F000","o":"B1E597","s":"_ZN5mongo4Lock6DBLockC2EPNS_6LockerENS_10StringDataENS_8LockModeE"},{"b":"558B1EC3F000","o":"B41A70","s":"_ZN5mongo9AutoGetDbC1EPNS_16OperationContextENS_10StringDataENS_8LockModeE"},{"b":"558B1EC3F000","o":"B4257A","s":"_ZN5mongo17AutoGetCollectionC2EPNS_16OperationContextERKNS_15NamespaceStringENS_8LockModeES6_NS0_8ViewModeE"},{"b":"558B1EC3F000","o":"B42BBB","s":"_ZN5mongo24AutoGetCollectionForReadC1EPNS_16OperationContextERKNS_15NamespaceStringENS_17AutoGetCollection8ViewModeE"},{"b":"558B1EC3F000","o":"B42CBC","s":"_ZN5mongo30AutoGetCollectionOrViewForReadC1EPNS_16OperationContextERKNS_15NamespaceStringE"},{"b":"558B1EC3F000","o":"A7E0C5","s":"_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE"},{"b":"558B1EC3F000","o":"A5505F","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"558B1EC3F000","o":"A56741","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"558B1EC3F000","o":"106B720","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"558B1EC3F000","o":"C6F8B2"},{"b":"558B1EC3F000","o":"C718B6","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"558B1EC3F000","o":"872C7D","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE"},{"b":"558B1EC3F000","o":"8735AD"},{"b":"558B1EC3F000","o":"14B3B31"},{"b":"7F2366ECA000","o":"7EA5"},{"b":"7F2366AFC000","o":"FE9FD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.10", "gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-1160.25.1.el7.x86_64", "version" : "#1 SMP Wed Apr 28 21:49:45 UTC 2021", "machine" : "x86_64" }, "somap" : [ { "b" : "558B1EC3F000", "elfType" : 3, "buildId" : "D9AB5C91FBC6F740604F4BC28348FE33EC87DEC2" }, { "b" : "7FFF9D46D000", "elfType" : 3, "buildId" : "2B8B701C7F88CF0CBFE440A7E699428A9DCD8C29" }, { "b" : "7F2367A0A000", "path" : "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/libpython3.7m.so", "elfType" : 3 }, { "b" : "7F2367F11000", "path" : "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/libtiff.so", "elfType" : 3 }, { "b" : "7F2367802000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "3E44DF7055942478D052E40FDD1F5B7862B152B0" }, { "b" : "7F23675FE000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "7F2E9CB0769D7E57BD669B485A74B537B63A57C4" }, { "b" : "7F23672FC000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "7615604EAF4A068DFAE5085444D15C0DEE93DFBD" }, { "b" : "7F23670E6000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "EDF51350C7F71496149D064AA8B1441F786DF88A" }, { "b" : "7F2366ECA000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "E10CC8F2B932FC3DAEDA22F8DAC5EBB969524E5B" }, { "b" : "7F2366AFC000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "A317B42B15368ADCAE21C11107691A03EC91059D" }, { "b" : "7F2367D74000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "62C449974331341BB08DCCE3859560A22AF1E172" }, { "b" : "7F23668F9000", "path" : "/lib64/libutil.so.1", "elfType" : 3, "buildId" : "FF2196BD22A8443054C83031E0E76EB01BA1219C" }, { "b" : "7F2367E72000", "path" : "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/./libwebp.so.7", "elfType" : 3 }, { "b" : "7F2367DA7000", "path" : "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/./libzstd.so.1", "elfType" : 3 }, { "b" : "7F23668D0000", "path" : "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/./liblzma.so.5", "elfType" : 3 }, { "b" : "7F2366892000", "path" : "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/./libjpeg.so.9", "elfType" : 3 }, { "b" : "7F2366878000", "path" : "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/./libz.so.1", "elfType" : 3 } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x558b20172ac1]
mongod(+0x1532CD9) [0x558b20171cd9]
mongod(+0x1533346) [0x558b20172346]
libpthread.so.0(+0xF630) [0x7f2366ed9630]
mongod(_ZN5mongo10LockerImplILb0EE9lockBeginENS_10ResourceIdENS_8LockModeE+0x1BD) [0x558b1f76bedd]
mongod(_ZN5mongo10LockerImplILb0EE16_lockGlobalBeginENS_8LockModeENS_8DurationISt5ratioILl1ELl1000EEEE+0xB8) [0x558b1f76c0c8]
mongod(_ZN5mongo4Lock10GlobalLock8_enqueueENS_8LockModeEj+0x3C) [0x558b1f75d47c]
mongod(_ZN5mongo4Lock10GlobalLockC1EPNS_6LockerENS_8LockModeEjNS1_11EnqueueOnlyE+0x38) [0x558b1f75d4c8]
mongod(_ZN5mongo4Lock10GlobalLockC2EPNS_6LockerENS_8LockModeEj+0x18) [0x558b1f75d508]
mongod(_ZN5mongo4Lock6DBLockC2EPNS_6LockerENS_10StringDataENS_8LockModeE+0x57) [0x558b1f75d597]
mongod(_ZN5mongo9AutoGetDbC1EPNS_16OperationContextENS_10StringDataENS_8LockModeE+0x20) [0x558b1f780a70]
mongod(_ZN5mongo17AutoGetCollectionC2EPNS_16OperationContextERKNS_15NamespaceStringENS_8LockModeES6_NS0_8ViewModeE+0x6A) [0x558b1f78157a]
mongod(_ZN5mongo24AutoGetCollectionForReadC1EPNS_16OperationContextERKNS_15NamespaceStringENS_17AutoGetCollection8ViewModeE+0x4B) [0x558b1f781bbb]
mongod(_ZN5mongo30AutoGetCollectionOrViewForReadC1EPNS_16OperationContextERKNS_15NamespaceStringE+0x2C) [0x558b1f781cbc]
mongod(_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE+0x9A5) [0x558b1f6bd0c5]
mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x4FF) [0x558b1f69405f]
mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0xF81) [0x558b1f695741]
mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x240) [0x558b1fcaa720]
mongod(+0xC6F8B2) [0x558b1f8ae8b2]
mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x746) [0x558b1f8b08b6]
mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1FD) [0x558b1f4b1c7d]
mongod(+0x8735AD) [0x558b1f4b25ad]
mongod(+0x14B3B31) [0x558b200f2b31]
libpthread.so.0(+0x7EA5) [0x7f2366ed1ea5]
libc.so.6(clone+0x6D) [0x7f2366bfa9fd]
----- END BACKTRACE -----
Around the same time from command_core.log:
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | Job Heartbeat check failed
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | Traceback (most recent call last):
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/pool.py", line 1272, in _get_socket
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | sock_info = self.sockets.popleft()
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | IndexError: pop from an empty deque
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR |
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | During handling of the above exception, another exception occurred:
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR |
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | Traceback (most recent call last):
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/pool.py", line 1180, in connect
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | sock = _configured_socket(self.address, self.opts)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/pool.py", line 988, in _configured_socket
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | sock = _create_connection(address, options)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/pool.py", line 972, in _create_connection
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | raise err
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/pool.py", line 965, in _create_connection
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | sock.connect(sa)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | ConnectionRefusedError: [Errno 111] Connection refused
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR |
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | During handling of the above exception, another exception occurred:
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR |
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | Traceback (most recent call last):
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/cryosparc_command/command_core/__init__.py", line 237, in background_worker
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | check_heartbeats()
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/cryosparc_command/command_core/__init__.py", line 2104, in check_heartbeats
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | 'heartbeat_at' : {'$lt' : deadline} }, {'project_uid' : 1, 'uid' : 1}))
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/cursor.py", line 1207, in next
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | if len(self.__data) or self._refresh():
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/cursor.py", line 1124, in _refresh
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | self.__send_message(q)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/cursor.py", line 1001, in __send_message
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | address=self.__address)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1372, in _run_operation_with_response
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | exhaust=exhaust)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1465, in _retryable_read
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | exhaust=exhaust) as (sock_info,
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/contextlib.py", line 112, in __enter__
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | return next(self.gen)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1309, in _slaveok_for_server
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | with self._get_socket(server, session, exhaust=exhaust) as sock_info:
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/contextlib.py", line 112, in __enter__
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | return next(self.gen)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1247, in _get_socket
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | self.__all_credentials, checkout=exhaust) as sock_info:
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/contextlib.py", line 112, in __enter__
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | return next(self.gen)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/pool.py", line 1225, in get_socket
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | sock_info = self._get_socket(all_credentials)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/pool.py", line 1275, in _get_socket
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | sock_info = self.connect(all_credentials)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/pool.py", line 1187, in connect
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | _raise_connection_failure(self.address, error)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/pool.py", line 286, in _raise_connection_failure
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | raise AutoReconnect(msg)
2022-05-09 23:29:08,386 COMMAND.BG_WORKER background_worker ERROR | pymongo.errors.AutoReconnect: SERVERNAME.TLD:10181: [Errno 111] Connection refused
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | Job Heartbeat check failed
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | Traceback (most recent call last):
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/cryosparc_command/command_core/__init__.py", line 237, in background_worker
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | check_heartbeats()
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/cryosparc_command/command_core/__init__.py", line 2104, in check_heartbeats
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | 'heartbeat_at' : {'$lt' : deadline} }, {'project_uid' : 1, 'uid' : 1}))
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/cursor.py", line 1207, in next
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | if len(self.__data) or self._refresh():
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/cursor.py", line 1100, in _refresh
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | self.__session = self.__collection.database.client._ensure_session()
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1816, in _ensure_session
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | return self.__start_session(True, causal_consistency=False)
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1766, in __start_session
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | server_session = self._get_server_session()
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1802, in _get_server_session
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | return self._topology.get_server_session()
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/topology.py", line 488, in get_server_session
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | None)
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/topology.py", line 217, in _select_servers_loop
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | (self._error_message(selector), timeout, self.description))
2022-05-09 23:29:39,652 COMMAND.BG_WORKER background_worker ERROR | pymongo.errors.ServerSelectionTimeoutError: SERVERNAME.TLD:10181: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 6278d8fbb91a6cd24e181aeb, topology_type: Single, servers: [<ServerDescription ('SERVERNAME.TLD', 10181) server_type: Unknown, rtt: None, error=AutoReconnect('SERVERNAME.TLD:10181: [Errno 111] Connection refused')>]>
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | JSONRPC ERROR at get_num_active_licenses
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | Traceback (most recent call last):
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | File "/MYPATH/cryosparc_master/cryosparc_command/command_core/__init__.py", line 150, in wrapper
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | res = func(*args, **kwargs)
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | File "/MYPATH/cryosparc_master/cryosparc_command/command_core/__init__.py", line 1741, in get_num_active_licenses
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | for j in jobs_running:
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/cursor.py", line 1207, in next
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | if len(self.__data) or self._refresh():
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/cursor.py", line 1100, in _refresh
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | self.__session = self.__collection.database.client._ensure_session()
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1816, in _ensure_session
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | return self.__start_session(True, causal_consistency=False)
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1766, in __start_session
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | server_session = self._get_server_session()
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1802, in _get_server_session
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | return self._topology.get_server_session()
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/topology.py", line 488, in get_server_session
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | None)
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | File "/MYPATH/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/topology.py", line 217, in _select_servers_loop
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | (self._error_message(selector), timeout, self.description))
2022-05-09 23:30:09,768 COMMAND.MAIN wrapper ERROR | pymongo.errors.ServerSelectionTimeoutError: SERVERNAME.TLD:10181: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 6278d8fbb91a6cd24e181aeb, topology_type: Single, servers: [<ServerDescription ('SERVERNAME.TLD', 10181) server_type: Unknown, rtt: None, error=AutoReconnect('SERVERNAME.TLD:10181: [Errno 111] Connection refused')>]>
This seems to get repeated on and on until I restart cryosparc.
This machine is only used for cryosparc and according to my logs, the last job (from a different instance of cryosparc) finished 2h earlier, the server was essentially idling for some hours. Although in the past crashes have happened at any possible time, often killing running jobs along the way. I have not been able to see any patterns regarding when instances crash, sometimes two times in a day, sometimes only every few days. In the last 24h every instance that ran crashed, but not at the same time.
I would be happy to provide any additional log outputs / information if needed and I would be very thankful about any helpful insights!
Best wishes,
Lukas
Update with some additional info:
cat cryosparc_worker/config.sh:
export CRYOSPARC_USE_GPU=true
export CRYOSPARC_CUDA_PATH="/usr/local/cuda"
export CRYOSPARC_DEVELOP=false
Workstation details: CentOS7, 3x RTX8000, 384GB RAM, 2x Xeon 6244
global config variables:
export CRYOSPARC_DB_PATH="/MYPATH/cryosparc_database"
export CRYOSPARC_BASE_PORT=10180
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
export CRYOSPARC_CLICK_WRAP=true
CRYOSPARC_FORCE_HOSTNAME=true