Hi,
after a recent storage outage on our cluster it appears that our cryosparc DB has some issues.
cryosparm status returns:
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/XXX/cryosparc_master
Current cryoSPARC version: v4.2.1
----------------------------------------------------------------------------
CryoSPARC is not running.
----------------------------------------------------------------------------
global config variables:
export CRYOSPARC_LICENSE_ID="XXXX"
export CRYOSPARC_MASTER_HOSTNAME="XXXXX"
export CRYOSPARC_DB_PATH="YYYYY/db"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
export CRYOSPARC_CLICK_WRAP=true
export CRYOSPARC_FORCE_HOSTNAME=true
Outside of the short term outage nothing changed. It is no longer possible to start the cryosparc services:
CryoSPARC is not already running.
If you would like to restart, use cryosparcm restart
Starting cryoSPARC System master process..
CryoSPARC is not already running.
configuring database
Warning: Could not get database status (attempt 1/3)
Warning: Could not get database status (attempt 2/3)
Warning: Could not get database status (attempt 3/3)
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/XXX/cryosparc_master/cryosparc_compute/database_management.py", line 48, in configure_mongo
initialize_replica_set()
File "/XXX/cryosparc_master/cryosparc_compute/database_management.py", line 87, in initialize_replica_set
admin_db = try_get_pymongo_admin_db(mongo_client)
File "/XXX/cryosparc_master/cryosparc_compute/database_management.py", line 249, in try_get_pymongo_admin_db
admin_db.command(({'serverStatus': 1}))
File "/XXX/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/database.py", line 827, in command
with self.__client._socket_for_reads(read_preference, session) as (sock_info, secondary_ok):
File "/XXX/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/XXX/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1478, in _socket_for_reads
server = self._select_server(read_preference, session)
File "/XXX/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1436, in _select_server
server = topology.select_server(server_selector)
File "/XXX/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/topology.py", line 250, in select_server
return random.choice(self.select_servers(selector, server_selection_timeout, address))
File "/XXX/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/topology.py", line 211, in select_servers
server_descriptions = self._select_servers_loop(selector, server_timeout, address)
File "/XXX/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/topology.py", line 226, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: localhost:39001: [Errno 111] Connection refused, Timeout: 20.0s, Topology Description: <TopologyDescription id: 64a7f5e1311935e61b62bf93, topology_type: Single, servers: [<ServerDescription ('localhost', 39001) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:39001: [Errno 111] Connection refused')>]>
[2023-07-07T13:25:28+0200] Error configuring database. Most recent database log lines:
mongod(wiredtiger_open+0x1BBA) [0x5556f4192c8a]
mongod(_ZN5mongo18WiredTigerKVEngineC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mmbbbb+0x8D6) [0x5556f415fcf6]
mongod(+0xA25AEC) [0x5556f4141aec]
mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x266) [0x5556f4351fb6]
mongod(+0xA025B8) [0x5556f411e5b8]
mongod(_ZN5mongo11mongoDbMainEiPPcS1_+0x26C) [0x5556f412163c]
mongod(main+0x9) [0x5556f40a7bc9]
libc.so.6(__libc_start_main+0xF5) [0x7f9c887303d5]
mongod(+0x9ED741) [0x5556f4109741]
ps xww | grep -e cryosparc -e mongo
returns no running processes on all nodes that could run cryosparc related jobs
I did however find a mongod.lock file in our db folder timestamped with the last scheduled reboot of cryosparc. fuser on it returns empty
From this thread Help! I seem to have broken cryosparc by moving the cyrosparc_user home directory to a new location and then moving it back again!
I assume the way forward is to delete the mongod.lock file followed by a restart of the cryosparc services
Would you suggest to run the mongodb recovery before or after trying to restart cryosparc? Last resort will of course be recovery via backup of the DB to a state previous to the outage.