Cannot start CryoSPARC: Error configuring database

Hello, a cryosparc instance for one of our users is currently failing to start and reporting an error configuring the databse. For information, the cryosparc instances on our cluster are configured in a kubernetes pod for each user group. Several identically configured pods (with different port ranges) are still running normally and this instance was running normally until very recently. Do you have any advice on getting it running again?

The following is the output from running cryosparcm start:

CryoSPARC is not already running.
configuring database
Warning: Could not get database status (attempt 1/3)
Warning: Could not get database status (attempt 2/3)
Warning: Could not get database status (attempt 3/3)
Traceback (most recent call last):
File “”, line 1, in
File “/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/cryosparc_compute/database_management.py”, line 49, in configure_mongo
initialize_replica_set()
File “/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/cryosparc_compute/database_management.py”, line 88, in initialize_replica_set
admin_db = try_get_pymongo_db(mongo_client)
File “/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/cryosparc_compute/database_management.py”, line 251, in try_get_pymongo_db
admin_db.command(({‘serverStatus’: 1}))
File “/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/database.py”, line 827, in command
with self.__client._socket_for_reads(read_preference, session) as (sock_info, secondary_ok):
File “/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/contextlib.py”, line 113, in enter
return next(self.gen)
File “/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/mongo_client.py”, line 1478, in _socket_for_reads
server = self._select_server(read_preference, session)
File “/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/mongo_client.py”, line 1436, in _select_server
server = topology.select_server(server_selector)
File “/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/topology.py”, line 250, in select_server
return random.choice(self.select_servers(selector, server_selection_timeout, address))
File “/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/topology.py”, line 211, in select_servers
server_descriptions = self._select_servers_loop(selector, server_timeout, address)
File “/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/topology.py”, line 226, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: localhost:30214: [Errno 111] Connection refused, Timeout: 20.0s, Topology Description: <TopologyDescription id: 66c5aba2ab4aed32941eabb6, topology_type: Single, servers: [<ServerDescription (‘localhost’, 30214) server_type: Unknown, rtt: None, error=AutoReconnect(‘localhost:30214: [Errno 111] Connection refused’)>]>
[2024-08-21T08:57:12+00:00] Error configuring database. Most recent database log lines:
mongod(wiredtiger_open+0x1BBA) [0x55c85d5e9c8a]
mongod(_ZN5mongo18WiredTigerKVEngineC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mmbbbb+0x8D6) [0x55c85d5b6cf6]
mongod(+0xA25AEC) [0x55c85d598aec]
mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x266) [0x55c85d7a8fb6]
mongod(+0xA025B8) [0x55c85d5755b8]
mongod(ZN5mongo11mongoDbMainEiPPcS1+0x26C) [0x55c85d57863c]
mongod(main+0x9) [0x55c85d4febc9]
libc.so.6(__libc_start_main+0xF3) [0x7f809aede083]
mongod(+0x9ED741) [0x55c85d560741]
----- END BACKTRACE -----

What are the outputs of the commands

/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/bin/cryosparcm log database | grep -i error | tail -n 20
cat /mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/version

Thanks for your reply, the output I get follows below (the 20 lines of output from the first command are quite long…) I have removed a lot of hex codes to bring the reply down to a permitted number of characters, but can post them if needed.

/mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/bin/cryosparcm log database | grep -i error | tail -n 20

2024-08-21T08:55:59.865+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724230559:865970][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 1 of 4):
Raw: [1724230559:865970][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 1 of 4):
2024-08-21T08:55:59.866+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724230559:866371][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 2 of 4):
Raw: [1724230559:866371][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 2 of 4):
2024-08-21T08:55:59.866+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724230559:866671][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 3 of 4):
Raw: [1724230559:866671][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 3 of 4):
2024-08-21T08:55:59.867+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724230559:867002][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 4 of 4):
Raw: [1724230559:867002][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 4 of 4):
2024-08-21T08:55:59.867+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1724230559:867359][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_block_read_off, 302: WiredTiger.wt: fatal read error: WT_ERROR: non-specific WiredTiger error Raw: [1724230559:867359][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_block_read_off, 302: WiredTiger.wt: fatal read error: WT_ERROR: non-specific WiredTiger error
2024-08-21T08:55:59.867+0000 E STORAGE [initandlisten] WiredTiger error (-31804) [1724230559:867676][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1724230559:867676][4091:0x7f809aeb4500], file:WiredTiger.wt, connection: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic
2024-08-21T09:42:39.897+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724233359:897634][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_block_read_off, 291: WiredTiger.wt: read checksum error for 4096B block at offset 12288: block header checksum of 1902573463 doesn’t match expected checksum of 189887979 Raw: [1724233359:897634][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_block_read_off, 291: WiredTiger.wt: read checksum error for 4096B block at offset 12288: block header checksum of 1902573463 doesn’t match expected checksum of 189887979
2024-08-21T09:42:39.898+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724233359:898110][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 1 of 4):
Raw: [1724233359:898110][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 1 of 4):
2024-08-21T09:42:39.898+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724233359:898821][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 3 of 4): Raw: [1724233359:898821][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 3 of 4):
2024-08-21T09:42:39.899+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724233359:899177][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 4 of 4):
2024-08-21T09:42:39.899+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1724233359:899405][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_block_read_off, 302: WiredTiger.wt: fatal read error: WT_ERROR: non-specific WiredTiger error Raw: [1724233359:899405][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_block_read_off, 302: WiredTiger.wt: fatal read error: WT_ERROR: non-specific WiredTiger error
2024-08-21T09:42:39.899+0000 E STORAGE [initandlisten] WiredTiger error (-31804) [1724233359:899611][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1724233359:899611][4259:0x7f1deea4e500], file:WiredTiger.wt, connection: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic
2024-08-21T19:14:43.334+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724267683:333984][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_block_read_off, 291: WiredTiger.wt: read checksum error for 4096B block at offset 12288: block header checksum of 1902573463 doesn’t match expected checksum of 189887979 Raw: [1724267683:333984][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_block_read_off, 291: WiredTiger.wt: read checksum error for 4096B block at offset 12288: block header checksum of 1902573463 doesn’t match expected checksum of 189887979
2024-08-21T19:14:43.334+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724267683:334532][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 1 of 4):
2024-08-21T19:14:43.334+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724267683:334856][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 2 of 4):
Raw: [1724267683:334856][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 2 of 4):
2024-08-21T19:14:43.335+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724267683:335231][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 3 of 4):
Raw: [1724267683:335231][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 3 of 4):
2024-08-21T19:14:43.335+0000 E STORAGE [initandlisten] WiredTiger error (0) [1724267683:335581][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 4 of 4):
Raw: [1724267683:335581][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_bm_corrupt_dump, 144: {12288, 4096, 189887979}: (chunk 4 of 4):
2024-08-21T19:14:43.335+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1724267683:335848][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_block_read_off, 302: WiredTiger.wt: fatal read error: WT_ERROR: non-specific WiredTiger error Raw: [1724267683:335848][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_block_read_off, 302: WiredTiger.wt: fatal read error: WT_ERROR: non-specific WiredTiger error
2024-08-21T19:14:43.336+0000 E STORAGE [initandlisten] WiredTiger error (-31804) [1724267683:336093][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1724267683:336093][4941:0x7f00a6ea5500], file:WiredTiger.wt, connection: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic

cat /mnt/beegfs/software/structural_biology/release/cryosparc/roberts/cryosparc/cryosparc_master/version

v4.4.1

indicates corruption of the database.
Do you have a recent backup of the database?

Thanks. I was afraid that might be the case. I do have a database backup - it is about a month old, but that is recent enough that no work would have been lost for this particular group.

@EdLowe I apologize for the delayed response. In case you have not yet moved ahead with recovery:

  1. You may want to identify and eliminate the cause of the corruption. In CryoSPARC v4.4, with database journaling enabled by default, corruption should be rare and might indicate a problem with the underlying storage. Ensure suitable filesystem settings (nfs example).
  2. Before proceeding with any of the procedures below, you should create a backup copy of the database directory, in case you may want to reuse its contents for any reason. Before copying, ensure CryoSPARC and its associated processes (including the mongod process) are shutdown completely.
  3. Restoration of an old database backup will likely lead to corruption of CryoSPARC project directories that have been changed after the backup has been created. Consider alternatives such as
  4. Alternative: rebuild the database from scratch
  5. Alternative with significant caveats: so-called --repair (docs)