Interface Input to Job Hangs

We have been having a problems with the web interface hanging up either after a period of time or if you are trying to access a job or create a new job. We are running the interface through a pipe to our campus HPC system. It looks to possibly be a mongo database issue. At least that seems to popup looking at the log files.

Thanks,
Len Thomas

Please describe the method/protocol you use for this pipe, including the (format of) the url you use to access the interface

Please post all output (with references to their sources) that suggest the database as the root cause.

This is from the app.log
(node:22382) Warning: Accessing non-existent property ‘count’ of module exports inside circular dependency
(Use node --trace-warnings ... to show where the warning was created)
(node:22382) Warning: Accessing non-existent property ‘findOne’ of module exports inside circular dependency
(node:22382) Warning: Accessing non-existent property ‘remove’ of module exports inside circular dependency
(node:22382) Warning: Accessing non-existent property ‘updateOne’ of module exports inside circular dependency
/ourdisk/hpc/bsc/cryosparc/cryosparc_master/cryosparc_app/api/bundle/programs/server/node_modules/fibers/future.js:
313
throw(ex);
^

Error: connect ECONNREFUSED 10.251.80.209:39001
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1146:16) {
name: ‘MongoNetworkError’,
errorLabels: [ ‘TransientTransactionError’ ],
[Symbol(mongoErrorContextSymbol)]: {}
}
(node:22744) Warning: Accessing non-existent property ‘count’ of module exports inside circular dependency
(Use node --trace-warnings ... to show where the warning was created)
(node:22744) Warning: Accessing non-existent property ‘findOne’ of module exports inside circular dependency
(node:22744) Warning: Accessing non-existent property ‘remove’ of module exports inside circular dependency
(node:22744) Warning: Accessing non-existent property ‘updateOne’ of module exports inside circular dependency
/ourdisk/hpc/bsc/cryosparc/cryosparc_master/cryosparc_app/api/bundle/programs/server/node_modules/fibers/future.js:
313
throw(ex);

And this is from the command_core.log
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | Cluster job monitor error
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | Traceback (most recent call last):
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | File “/ourdisk/hpc/bsc/cryosparc/cry
osparc_master/cryosparc_command/command_core/init.py”, line 187, in background_worker
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | cluster_job_monitor()
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | File “/ourdisk/hpc/bsc/cryosparc/cry
osparc_master/cryosparc_command/command_core/init.py”, line 8442, in cluster_job_monitor
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | for job in jobs:
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | File “/ourdisk/hpc/bsc/cryosparc/cry
osparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/cursor.py”, line 1280, in
next
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | if len(self.__data) or self._refre
sh():
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | File “/ourdisk/hpc/bsc/cryosparc/cry
osparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/cursor.py”, line 1165, in
_refresh
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | self.__session = self.__collection
.database.client._ensure_session()
2023-06-06 20:06:20,164 COMMAND.BG_WORKER background_worker ERROR | File “/ourdisk/hpc/bsc/cryosparc/cry
osparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/mongo_client.py”, line 20
27, in _ensure_session

These 2 seem to be related.

As for the pipe we ssh -N -L 39000:10.251.80.26:39000

Len

Please can you:

  • post output of
    cryosparcm status | grep -v LICENSE
  • check /ourdisk/hpc/bsc/cryosparc/cryosparc_master/run/database.log for errors

The status command is not viable on our system. Not recognized.

Last entries in database.log

2023-06-09T14:26:06.870+0000 E STORAGE  [WTCheckpointThread] WiredTiger error (22) [1686320766:870408][261160:0x7fdaa076d640], file:index-0--3988199227014740595.wt, WT_SESSION.checkpoint: __wt_block_checkpoint_resolve, 859: index-0--3988199227014740595.wt: the checkpoint failed, the system must restart: Invalid argument Raw: [1686320766:870408][261160:0x7fdaa076d640], file:index-0--3988199227014740595.wt, WT_SESSION.checkpoint: __wt_block_checkpoint_resolve, 859: index-0--3988199227014740595.wt: the checkpoint failed, the system must restart: Invalid argument
2023-06-09T14:26:06.870+0000 E STORAGE  [WTCheckpointThread] WiredTiger error (-31804) [1686320766:870446][261160:0x7fdaa076d640], WT_SESSION.checkpoint: __meta_track_unroll, 233: metadata unroll update file:index-0--3988199227014740595.wt to access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "filename" : 1, "uploadDate" : 1 }, "name" : "filename_1_uploadDate_1", "ns" : "meteor.fs.files" }),block_allocation=best,block_compressor=,cache_resident=false,checksum=on,collator=,columns=,dictionary=0,encryption=(keyid=,name=),format=btree,huffman_key=,huffman_value=,id=98,ignore_in_memory_cache_size=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=16k,key_format=u,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=16k,leaf_value_max=0,log=(enabled=true),memory_page_max=5MB,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=true,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1),checkpoint=(WiredTigerCheckpoint.2482=(addr="01b481e4decc1e60b581e41b24faf7b681e466b8236d808080e3049fc0e303afc0",order=2482,time=1686320746,size=253952,write_gen=5918)),checkpoint_lsn=(4294967295,2147483647): WT_PANIC: WiredTiger library panic Raw: [1686320766:870446][261160:0x7fdaa076d640], WT_SESSION.checkpoint: __meta_track_unroll, 233: metadata unroll update file:index-0--3988199227014740595.wt to access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "filename" : 1, "uploadDate" : 1 }, "name" : "filename_1_uploadDate_1", "ns" : "meteor.fs.files" }),block_allocation=best,block_compressor=,cache_resident=false,checksum=on,collator=,columns=,dictionary=0,encryption=(keyid=,name=),format=btree,huffman_key=,huffman_value=,id=98,ignore_in_memory_cache_size=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=16k,key_format=u,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=16k,leaf_value_max=0,log=(enabled=true),memory_page_max=5MB,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=true,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1),checkpoint=(WiredTigerCheckpoint.2482=(addr="01b481e4decc1e60b581e41b24faf7b681e466b8236d808080e3049fc0e303afc0",order=2482,time=1686320746,size=253952,write_gen=5918)),checkpoint_lsn=(4294967295,2147483647): WT_PANIC: WiredTiger library panic
2023-06-09T14:26:06.870+0000 F -        [WTCheckpointThread] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 420
2023-06-09T14:26:06.870+0000 F -        [WTCheckpointThread] \n\n***aborting after fassert() failure\n\n
2023-06-09T14:26:06.948+0000 F -        [conn11] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 74
2023-06-09T14:26:06.948+0000 F -        [conn11] \n\n***aborting after fassert() failure\n\n
2023-06-09T14:26:06.955+0000 F -        [conn18] Got signal: 6 (Aborted).

Please can you also post any additional error lines that immediately precede the database log excerpt that you just posted.

cryosparcm may not be in your $PATH. Please try

/ourdisk/hpc/bsc/cryosparc/cryosparc_master/bin/cryosparcm status | grep -v LICENSE

and also post the output of
df -h /ourdisk/hpc/bsc/cryosparc/

Here is some more

2023-06-09T15:27:41.399+0000 I COMMAND  [conn10] command meteor.reports command: find { find: "reports", filter: { name: "jobs_by_status", variables: "{"userId":"641cc6f92a8bbab0c71132bd","groupTime":"month"}" }, limit: 1, singleBatch: true, batchSize: 1, lsid: { id: UUID("7fa07883-7c89-4270-ba54-3ac0493c7ebe") }, $clusterTime: { clusterTime: Timestamp(1686324461, 19), signature: { hash: BinData(0, 4870CA386F08A5409181A3DB92CEECEFC846CFB6), keyId: 7186711820835487745 } }, $db: "meteor" } planSummary: COLLSCAN keysExamined:0 docsExamined:22 cursorExhausted:1 numYields:1 nreturned:1 reslen:442 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_msg 126ms
2023-06-09T15:27:41.519+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:37774 #27 (20 connections now open)
2023-06-09T15:27:41.519+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:37776 #28 (21 connections now open)
2023-06-09T15:27:41.519+0000 I NETWORK  [conn28] received client metadata from 127.0.0.1:37776 conn28: { driver: { name: "nodejs", version: "4.9.0" }, os: { type: "Linux", name: "linux", architecture: "x64", version: "5.15.0-41-generic" }, platform: "Node.js v16.17.1, LE (unified)|Node.js v16.17.1, LE (unified)" }
2023-06-09T15:27:41.520+0000 I NETWORK  [conn27] received client metadata from 127.0.0.1:37774 conn27: { driver: { name: "nodejs", version: "4.9.0" }, os: { type: "Linux", name: "linux", architecture: "x64", version: "5.15.0-41-generic" }, platform: "Node.js v16.17.1, LE (unified)|Node.js v16.17.1, LE (unified)" }
2023-06-09T15:27:41.522+0000 I ACCESS   [conn28] Successfully authenticated as principal cryosparc_user on admin from client 127.0.0.1:37776
2023-06-09T15:27:41.522+0000 I ACCESS   [conn27] Successfully authenticated as principal cryosparc_user on admin from client 127.0.0.1:37774
2023-06-09T15:29:26.471+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:37782 #29 (22 connections now open)
2023-06-09T15:29:26.472+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:37784 #30 (23 connections now open)
2023-06-09T15:29:26.473+0000 I NETWORK  [conn29] received client metadata from 127.0.0.1:37782 conn29: { driver: { name: "nodejs", version: "4.9.0" }, os: { type: "Linux", name: "linux", architecture: "x64", version: "5.15.0-41-generic" }, platform: "Node.js v16.17.1, LE (unified)|Node.js v16.17.1, LE (unified)" }
2023-06-09T15:29:26.474+0000 I NETWORK  [conn30] received client metadata from 127.0.0.1:37784 conn30: { driver: { name: "nodejs", version: "4.9.0" }, os: { type: "Linux", name: "linux", architecture: "x64", version: "5.15.0-41-generic" }, platform: "Node.js v16.17.1, LE (unified)|Node.js v16.17.1, LE (unified)" }
2023-06-09T15:29:26.475+0000 I ACCESS   [conn29] Successfully authenticated as principal cryosparc_user on admin from client 127.0.0.1:37782
2023-06-09T15:29:26.480+0000 I ACCESS   [conn30] Successfully authenticated as principal cryosparc_user on admin from client 127.0.0.1:37784
2023-06-09T15:30:59.132+0000 I COMMAND  [LogicalSessionCacheRefresh] command config.$cmd command: update { update: "system.sessions", ordered: false, allowImplicitCollectionCreation: false, writeConcern: { w: "majority", wtimeout: 15000 }, $db: "config" } numYields:0 reslen:1689 locks:{ Global: { acquireCount: { r: 30, w: 30 } }, Database: { acquireCount: { w: 30 } }, Collection: { acquireCount: { w: 15 } }, oplog: { acquireCount: { w: 15 } } } protocol:op_msg 1380ms
2023-06-09T15:35:58.814+0000 I COMMAND  [LogicalSessionCacheRefresh] command config.$cmd command: update { update: "system.sessions", ordered: false, allowImplicitCollectionCreation: false, writeConcern: { w: "majority", wtimeout: 15000 }, $db: "config" } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 6, w: 6 } }, Database: { acquireCount: { w: 6 } }, Collection: { acquireCount: { w: 3 } }, oplog: { acquireCount: { w: 3 } } } protocol:op_msg 1062ms
2023-06-09T15:40:58.745+0000 I COMMAND  [LogicalSessionCacheRefresh] command config.$cmd command: update { update: "system.sessions", ordered: false, allowImplicitCollectionCreation: false, writeConcern: { w: "majority", wtimeout: 15000 }, $db: "config" } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 8, w: 8 } }, Database: { acquireCount: { w: 8 } }, Collection: { acquireCount: { w: 4 } }, oplog: { acquireCount: { w: 4 } } } protocol:op_msg 993ms
2023-06-09T15:45:58.878+0000 I COMMAND  [LogicalSessionCacheRefresh] command config.$cmd command: update { update: "system.sessions", ordered: false, allowImplicitCollectionCreation: false, writeConcern: { w: "majority", wtimeout: 15000 }, $db: "config" } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 6, w: 6 } }, Database: { acquireCount: { w: 6 } }, Collection: { acquireCount: { w: 3 } }, oplog: { acquireCount: { w: 3 } } } protocol:op_msg 1126ms
2023-06-09T15:50:32.500+0000 I COMMAND  [conn11] command meteor.fs.files command: find { find: "fs.files", filter: { _id: ObjectId('6436e938c0b102fcdf84c474') }, limit: 1, singleBatch: true, batchSize: 1, lsid: { id: UUID("d5213614-1a81-4ad1-93c5-308f26348d11") }, $clusterTime: { clusterTime: Timestamp(1686325832, 35), signature: { hash: BinData(0, 83BAAEC2D2A3B31F634221A642650B7D1DC86DA3), keyId: 7186711820835487745 } }, $db: "meteor" } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:419 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_msg 157ms

A fair amount of this also

2023-06-09T15:50:42.003+0000 E STORAGE  [conn16] WiredTiger error (0) [1686325842:3852][308188:0x7f3d7db2b640], file:collection-21--939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {45858816, 8192, 134772994}: (chunk 4 of 8): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0

And

2023-06-09T15:26:38.734+0000 I ACCESS   [conn11] Successfully authenticated as principal cryosparc_user on admin from client 127.0.0.1:37748
2023-06-09T15:26:38.734+0000 I ACCESS   [conn10] Successfully authenticated as principal cryosparc_user on admin from client 127.0.0.1:37746
2023-06-09T15:26:41.343+0000 I NETWORK  [listener] connection accepted from 10.251.80.26:45112 #12 (8 connections now open)
2023-06-09T15:26:41.344+0000 I NETWORK  [conn12] received client metadata from 10.251.80.26:45112 conn12: { driver: { name: "PyMongo", version: "3.13.0" }, os: { type: "Linux", name: "Linux", architecture: "x86_64", version: "5.15.0-41-generic" }, platform: "CPython 3.8.15.final.0" }
2023-06-09T15:26:41.345+0000 I NETWORK  [listener] connection accepted from 10.251.80.26:45114 #13 (9 connections now open)
2023-06-09T15:26:41.345+0000 I NETWORK  [conn13] received client metadata from 10.251.80.26:45114 conn13: { driver: { name: "PyMongo", version: "3.13.0" }, os: { type: "Linux", name: "Linux", architecture: "x86_64", version: "5.15.0-41-generic" }, platform: "CPython 3.8.15.final.0" }
2023-06-09T15:26:41.350+0000 I ACCESS   [conn13] Successfully authenticated as principal cryosparc_user on admin from client 10.251.80.26:45114
2023-06-09T15:27:09.544+0000 I NETWORK  [listener] connection accepted from 10.251.80.26:45118 #14 (10 connections now open)
2023-06-09T15:27:09.549+0000 I NETWORK  [conn14] received client metadata from 10.251.80.26:45118 conn14: { driver: { name: "nodejs", version: "4.3.1" }, os: { type: "Linux", name: "linux", architecture: "x64", version: "5.15.0-41-generic" }, platform: "Node.js v14.19.3, LE (unified)|Node.js v14.19.3, LE (unified)" }
2023-06-09T15:27:09.554+0000 I NETWORK  [conn14] end connection 10.251.80.26:45118 (9 connections now open)
2023-06-09T15:27:09.556+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:37750 #15 (10 connections now open)
2023-06-09T15:27:09.556+0000 I NETWORK  [conn15] received client metadata from 127.0.0.1:37750 conn15: { driver: { name: "nodejs", version: "4.3.1" }, os: { type: "Linux", name: "linux", architecture: "x64", version: "5.15.0-41-generic" }, platform: "Node.js v14.19.3, LE (unified)|Node.js v14.19.3, LE (unified)" }

You could try this command to collect error messages specifically

grep ' E ' /ourdisk/hpc/bsc/cryosparc/cryosparc_master/run/database.log

Complete database error messages (including all the lines relevant to the error at 2023-06-09T14:26), as well as outputs of the cryosparcm and df commands mentioned in Interface Input to Job Hangs - #6 by wtempel might provide some essential troubleshooting information.

Output from status

----------------------------------------------------------------------------
CryoSPARC System master node installed at
/ourdisk/hpc/bsc/cryosparc/cryosparc_master
Current cryoSPARC version: v4.2.1
----------------------------------------------------------------------------

CryoSPARC is not running.

----------------------------------------------------------------------------

global config variables:
export CRYOSPARC_MASTER_HOSTNAME="10.251.80.26"
export CRYOSPARC_DB_PATH="/ourdisk/hpc/bsc/cryosparc/cryosparc_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=true
export CRYOSPARC_CLICK_WRAP=true
export CRYOSPARC_FORCE_HOSTNAME=true

From df

Filesystem                                                                                                                                        Size  Used Avail Use% Mounted on
10.251.0.30:6789,10.251.0.31:6789,10.251.0.32:6789,10.251.0.33:6789,10.251.0.40:6789:/volumes/hpc_condo/bsc/8e1cdd96-0972-47da-88cc-df5e6c16c92d   34T   28T  6.7T  81% /ourdisk/hpc/bsc

The grep ‘E’ produces and output that is past the date/time stamp you asked for. I have been traveling and just returned home and can get the output a bit later once I find a mouse, Mac track pad is useless for handling large amounts of text.

Thank you for your help

Below is the grep output for the time stamp 2023-06-09T14:26 , I deleted a large number of 00 strings due to character limitations.

2023-06-09T14:26:06.836+0000 E STORAGE [conn18] WiredTiger error (0) [1686320766:836671][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_block_read_off, 291: collection-21–939806622404153504.wt: read checksum error for 4096B block at offset 41750528: block header checksum of 0 doesn’t match expected checksum of 3783683804 Raw: [1686320766:836671][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_block_read_off, 291: collection-21–939806622404153504.wt: read checksum error for 4096B block at offset 41750528: block header checksum of 0 doesn’t match expected checksum of 3783683804
2023-06-09T14:26:06.836+0000 E STORAGE [conn18] WiredTiger error (0) [1686320766:836800][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {41750528, 4096, 3783683804}: (chunk 1 of 4): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Raw: [1686320766:836800][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {41750528, 4096, 3783683804}: (chunk 1 of 4): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2023-06-09T14:26:06.836+0000 E STORAGE [conn18] WiredTiger error (0) [1686320766:836908][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {41750528, 4096, 3783683804}: (chunk 2 of 4): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Raw: [1686320766:836908][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {41750528, 4096, 3783683804}: (chunk 2 of 4): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2023-06-09T14:26:06.837+0000 E STORAGE [conn18] WiredTiger error (0) [1686320766:836998][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {41750528, 4096, 3783683804}: (chunk 3 of 4): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Raw: [1686320766:836998][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {41750528, 4096, 3783683804}: (chunk 3 of 4): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2023-06-09T14:26:06.837+0000 E STORAGE [conn18] WiredTiger error (0) [1686320766:837088][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {41750528, 4096, 3783683804}: (chunk 4 of 4): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Raw: [1686320766:836998][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {41750528, 4096, 3783683804}: (chunk 3 of 4): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2023-06-09T14:26:06.837+0000 E STORAGE [conn18] WiredTiger error (0) [1686320766:837088][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {41750528, 4096, 3783683804}: (chunk 4 of 4): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Raw: [1686320766:837088][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 144: {41750528, 4096, 3783683804}: (chunk 4 of 4): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2023-06-09T14:26:06.837+0000 E STORAGE [conn18] WiredTiger error (-31802) [1686320766:837105][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_block_read_off, 302: collection-21–939806622404153504.wt: fatal read error: WT_ERROR: non-specific WiredTiger error Raw: [1686320766:837105][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_block_read_off, 302: collection-21–939806622404153504.wt: fatal read error: WT_ERROR: non-specific WiredTiger error
2023-06-09T14:26:06.837+0000 E STORAGE [conn18] WiredTiger error (-31804) [1686320766:837111][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1686320766:837111][261160:0x7fda82b27640], file:collection-21–939806622404153504.wt, WT_CURSOR.search: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic
2023-06-09T14:26:06.870+0000 E STORAGE [WTCheckpointThread] WiredTiger error (22) [1686320766:870408][261160:0x7fdaa076d640], file:index-0–3988199227014740595.wt, WT_SESSION.checkpoint: __wt_block_checkpoint_resolve, 859: index-0–3988199227014740595.wt: the checkpoint failed, the system must restart: Invalid argument Raw: [1686320766:870408][261160:0x7fdaa076d640], file:index-0–3988199227014740595.wt, WT_SESSION.checkpoint: __wt_block_checkpoint_resolve, 859: index-0–3988199227014740595.wt: the checkpoint failed, the system must restart: Invalid argument
2023-06-09T14:26:06.870+0000 E STORAGE [WTCheckpointThread] WiredTiger error (-31804) [1686320766:870446][261160:0x7fdaa076d640], WT_SESSION.checkpoint: __meta_track_unroll, 233: metadata unroll update file:index-0–3988199227014740595.wt to access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=8,infoObj={ “v” : 2, “key” : { “filename” : 1, “uploadDate” : 1 }, “name” : “filename_1_uploadDate_1”, “ns” : “meteor.fs.files” }),block_allocation=best,block_compressor=,cache_resident=false,checksum=on,collator=,columns=,dictionary=0,encryption=(keyid=,name=),format=btree,huffman_key=,huffman_value=,id=98,ignore_in_memory_cache_size=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=16k,key_format=u,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=16k,leaf_value_max=0,log=(enabled=true),memory_page_max=5MB,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=true,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1),checkpoint=(WiredTigerCheckpoint.2482=(addr=“01b481e4decc1e60b581e41b24faf7b681e466b8236d808080e3049fc0e303afc0”,order=2482,time=1686320746,size=253952,write_gen=5918)),checkpoint_lsn=(4294967295,2147483647): WT_PANIC: WiredTiger library panic Raw: [1686320766:870446][261160:0x7fdaa076d640], WT_SESSION.checkpoint: __meta_track_unroll, 233: metadata unroll update file:index-0–3988199227014740595.wt to access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=8,infoObj={ “v” : 2, “key” : { “filename” : 1, “uploadDate” : 1 }, “name” : “filename_1_uploadDate_1”, “ns” : “meteor.fs.files” }),block_allocation=best,block_compressor=,cache_resident=false,checksum=on,collator=,columns=,dictionary=0,encryption=(keyid=,name=),format=btree,huffman_key=,huffman_value=,id=98,ignore_in_memory_cache_size=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=16k,key_format=u,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=16k,leaf_value_max=0,log=(enabled=true),memory_page_max=5MB,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=true,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1),checkpoint=(WiredTigerCheckpoint.2482=(addr=“01b481e4decc1e60b581e41b24faf7b681e466b8236d808080e3049fc0e303afc0”,order=2482,time=1686320746,size=253952,write_gen=5918)),checkpoint_lsn=(4294967295,2147483647): WT_PANIC: WiredTiger library panic

The database may be corrupted.

You may want to investigate possible causes of a putative database corruption, such as:

  • did the volume that holds /ourdisk/hpc/bsc/cryosparc/cryosparc_database run out of space at some point?
  • is the wiredtiger mongo engine supported on the filesystem? To find out the type of the filesystem, you can run
    stat -f /ourdisk/hpc/bsc/cryosparc/cryosparc_database.
  • did the computer shut down due to a power failure?

To recover, you may want to