Database failure

open

#1

Hi - our cryoSPARC database suddenly died on us earlier this week. Everything seemed to be running fine then the web UI stopped populating items and when I refreshed the page it crashed. When I try to restart cryoSPARC I get a database spawn error:

[hansenbry@D01985551 ~]$ cryosparcm restart
CryoSPARC is not already running.
If you would like to restart, use cryosparcm restart
Starting cryoSPARC System master process..
CryoSPARC is not already running.
database: ERROR (spawn error)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1149, in database_names
    "listDatabases")["databases"]]
  File "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/database.py", line 491, in command
    with client._socket_for_reads(read_preference) as (sock_info, slave_ok):
  File "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/mongo_client.py", line 859, in _socket_for_reads
    with self._get_socket(read_preference) as sock_info:
  File "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/mongo_client.py", line 823, in _get_socket
    server = self._get_topology().select_server(selector)
  File "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/topology.py", line 214, in select_server
    address))
  File "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/topology.py", line 189, in select_servers
    self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: localhost:39001: [Errno 111] Connection refused

When I checked the status it listed a database failure:

[hansenbry@D01985551 Proc1-cryosparc]$ cryosparcm status
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master
Current cryoSPARC version: v2.11.2-live_privatebeta
----------------------------------------------------------------------------

cryosparcm process status:

app                              STOPPED   Not started
app_dev                          STOPPED   Not started
command_core                     STOPPED   Not started
command_proxy                    STOPPED   Not started
command_rtp                      STOPPED   Not started
command_vis                      STOPPED   Not started
database                         FATAL     Exited too quickly (process log may have details)
watchdog_dev                     STOPPED   Not started
webapp                           STOPPED   Not started
webapp_dev                       STOPPED   Not started

----------------------------------------------------------------------------

global config variables:

export CRYOSPARC_LICENSE_ID="xxxxxxxxxx"
export CRYOSPARC_MASTER_HOSTNAME="xxxxxxxxxx"
export CRYOSPARC_DB_PATH="xxxxxxxxxx"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false

The log I’m not sure how to interpret and was hoping someone here could help me figure out how to recover from the crash or if I just need to reinstall:

[hansenbry@D01985551 Proc1-cryosparc]$ cryosparcm log database
 0x55d2f56e4d6b 0x55d2f56e3fe6 0x55d2f56e44c3 0x7f044d6525f0 0x7f044d2ab337 0x7f044d2aca28 0x55d2f4903887 0x55d2f53db597 0x55d2f490de75 0x55d2f490e08c 0x55d2f490e2f5 0x55d2f60db9d7 0x55d2f60da1d9 0x55d2f608f893 0x55d2f611a974 0x55d2f611af1f 0x55d2f611b1cc 0x55d2f609d8a9 0x55d2f6110618 0x55d2f60d98fe 0x55d2f60d99eb 0x55d2f608be6a 0x55d2f53c0685 0x55d2f53b89ca 0x55d2f529e50a 0x55d2f490fbab 0x7f044d297505 0x55d2f4973cff
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55D2F42C4000","o":"1420D6B","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55D2F42C4000","o":"141FFE6"},{"b":"55D2F42C4000","o":"14204C3"},{"b":"7F044D643000","o":"F5F0"},{"b":"7F044D275000","o":"36337","s":"gsignal"},{"b":"7F044D275000","o":"37A28","s":"abort"},{"b":"55D2F42C4000","o":"63F887","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"55D2F42C4000","o":"1117597"},{"b":"55D2F42C4000","o":"649E75","s":"__wt_eventv"},{"b":"55D2F42C4000","o":"64A08C","s":"__wt_err"},{"b":"55D2F42C4000","o":"64A2F5","s":"__wt_panic"},{"b":"55D2F42C4000","o":"1E179D7","s":"__wt_turtle_read"},{"b":"55D2F42C4000","o":"1E161D9","s":"__wt_metadata_search"},{"b":"55D2F42C4000","o":"1DCB893","s":"__wt_conn_btree_open"},{"b":"55D2F42C4000","o":"1E56974","s":"__wt_session_get_btree"},{"b":"55D2F42C4000","o":"1E56F1F","s":"__wt_session_get_btree"},{"b":"55D2F42C4000","o":"1E571CC","s":"__wt_session_get_btree_ckpt"},{"b":"55D2F42C4000","o":"1DD98A9","s":"__wt_curfile_open"},{"b":"55D2F42C4000","o":"1E4C618"},{"b":"55D2F42C4000","o":"1E158FE","s":"__wt_metadata_cursor_open"},{"b":"55D2F42C4000","o":"1E159EB","s":"__wt_metadata_cursor"},{"b":"55D2F42C4000","o":"1DC7E6A","s":"wiredtiger_open"},{"b":"55D2F42C4000","o":"10FC685","s":"_ZN5mongo18WiredTigerKVEngineC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb"},{"b":"55D2F42C4000","o":"10F49CA"},{"b":"55D2F42C4000","o":"FDA50A","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"55D2F42C4000","o":"64BBAB","s":"main"},{"b":"7F044D275000","o":"22505","s":"__libc_start_main"},{"b":"55D2F42C4000","o":"6AFCFF"}],"processInfo":{ "mongodbVersion" : "3.4.10-4-g67ee356c6b", "gitVersion" : "67ee356c6be377cda547d16423daef5beb4e8377", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-1062.4.1.el7.x86_64", "version" : "#1 SMP Fri Oct 18 17:15:30 UTC 2019", "machine" : "x86_64" }, "somap" : [ { "b" : "55D2F42C4000", "elfType" : 3, "buildId" : "A14732EAAA95508462633A5030B501E08B1B106F" }, { "b" : "7FFE1EEF1000", "elfType" : 3, "buildId" : "7F945B2385125E0B9A6C187543A8C96F8F9400C0" }, { "b" : "7F044E7FA000", "path" : "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/lib/libpython2.7.so.1.0", "elfType" : 3 }, { "b" : "7F044E544000", "path" : "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/bin/../lib/libssl.so.1.0.0", "elfType" : 3 }, { "b" : "7F044E103000", "path" : "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/bin/../lib/libcrypto.so.1.0.0", "elfType" : 3 }, { "b" : "7F044DED9000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "4749697BF078337576C4629F0D30B296A0939779" }, { "b" : "7F044DCD5000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "18113E6E83D8E981B8E8D808F7F3DBB23F950A1D" }, { "b" : "7F044DB61000", "path" : "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/bin/../lib/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F044D85F000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5681C054FDABCF789F4DDA66E94F1F6ED1747327" }, { "b" : "7F044E7E3000", "path" : "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/bin/../lib/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F044D643000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "8B33F7F8C86F8D544C63C5541A8E42B3DDFEF8B1" }, { "b" : "7F044D275000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "398944D32CF16A67AF51067A326E6C0CC14F90ED" }, { "b" : "7F044D072000", "path" : "/lib64/libutil.so.1", "elfType" : 3, "buildId" : "E0D39E293DC99997E7B4C9B6203301E6CD904B50" }, { "b" : "7F044E7BA000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5CC1A53B747A7E4D21198723C2B633E54F3C06D9" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x3B) [0x55d2f56e4d6b]
 mongod(+0x141FFE6) [0x55d2f56e3fe6]
 mongod(+0x14204C3) [0x55d2f56e44c3]
 libpthread.so.0(+0xF5F0) [0x7f044d6525f0]
 libc.so.6(gsignal+0x37) [0x7f044d2ab337]
 libc.so.6(abort+0x148) [0x7f044d2aca28]
 mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x55d2f4903887]
 mongod(+0x1117597) [0x55d2f53db597]
 mongod(__wt_eventv+0x3E4) [0x55d2f490de75]
 mongod(__wt_err+0xA0) [0x55d2f490e08c]
 mongod(__wt_panic+0x2F) [0x55d2f490e2f5]
 mongod(__wt_turtle_read+0x227) [0x55d2f60db9d7]
 mongod(__wt_metadata_search+0x99) [0x55d2f60da1d9]
 mongod(__wt_conn_btree_open+0x73) [0x55d2f608f893]
 mongod(__wt_session_get_btree+0xE4) [0x55d2f611a974]
 mongod(__wt_session_get_btree+0x68F) [0x55d2f611af1f]
 mongod(__wt_session_get_btree_ckpt+0x14C) [0x55d2f611b1cc]
 mongod(__wt_curfile_open+0x169) [0x55d2f609d8a9]
 mongod(+0x1E4C618) [0x55d2f6110618]
 mongod(__wt_metadata_cursor_open+0x6E) [0x55d2f60d98fe]
 mongod(__wt_metadata_cursor+0x4B) [0x55d2f60d99eb]
 mongod(wiredtiger_open+0x183A) [0x55d2f608be6a]
 mongod(_ZN5mongo18WiredTigerKVEngineC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb+0x815) [0x55d2f53c0685]
 mongod(+0x10F49CA) [0x55d2f53b89ca]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x69A) [0x55d2f529e50a]
 mongod(main+0xE9B) [0x55d2f490fbab]
 libc.so.6(__libc_start_main+0xF5) [0x7f044d297505]
 mongod(+0x6AFCFF) [0x55d2f4973cff]
-----  END BACKTRACE  -----
2019-11-13T21:22:53.528-0700 I CONTROL  [initandlisten] MongoDB starting : pid=21956 port=39001 dbpath=/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_database 64-bit host=D01985551.niaid.nih.gov
2019-11-13T21:22:53.528-0700 I CONTROL  [initandlisten] db version v3.4.10-4-g67ee356c6b
2019-11-13T21:22:53.528-0700 I CONTROL  [initandlisten] git version: 67ee356c6be377cda547d16423daef5beb4e8377
2019-11-13T21:22:53.528-0700 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2p  14 Aug 2018
2019-11-13T21:22:53.528-0700 I CONTROL  [initandlisten] allocator: tcmalloc
2019-11-13T21:22:53.528-0700 I CONTROL  [initandlisten] modules: none
2019-11-13T21:22:53.528-0700 I CONTROL  [initandlisten] build environment:
2019-11-13T21:22:53.528-0700 I CONTROL  [initandlisten]     distarch: x86_64
2019-11-13T21:22:53.528-0700 I CONTROL  [initandlisten]     target_arch: x86_64
2019-11-13T21:22:53.528-0700 I CONTROL  [initandlisten] options: { net: { port: 39001 }, replication: { oplogSizeMB: 64, replSet: "meteor" }, storage: { dbPath: "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_database", journal: { enabled: false }, wiredTiger: { engineConfig: { cacheSizeGB: 4.0 } } } }
2019-11-13T21:22:53.529-0700 W -        [initandlisten] Detected unclean shutdown - /net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_database/mongod.lock is not empty.
2019-11-13T21:22:53.548-0700 I -        [initandlisten] Detected data files in /net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_database created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2019-11-13T21:22:53.549-0700 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2019-11-13T21:22:53.550-0700 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=4096M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),,log=(enabled=false),
2019-11-13T21:22:53.564-0700 E STORAGE  [initandlisten] WiredTiger error (0) [1573705373:564064][21956:0x7f3d3def2d40], file:WiredTiger.wt, connection: WiredTiger.turtle: encountered an illegal file format or internal value
2019-11-13T21:22:53.564-0700 E STORAGE  [initandlisten] WiredTiger error (-31804) [1573705373:564117][21956:0x7f3d3def2d40], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-11-13T21:22:53.564-0700 I -        [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2019-11-13T21:22:53.564-0700 I -        [initandlisten] 

***aborting after fassert() failure


2019-11-13T21:22:53.568-0700 F -        [initandlisten] Got signal: 6 (Aborted).

 0x5606a60f5d6b 0x5606a60f4fe6 0x5606a60f54c3 0x7f3d3cd665f0 0x7f3d3c9bf337 0x7f3d3c9c0a28 0x5606a5314887 0x5606a5dec597 0x5606a531ee75 0x5606a531f08c 0x5606a531f2f5 0x5606a6aec9d7 0x5606a6aeb1d9 0x5606a6aa0893 0x5606a6b2b974 0x5606a6b2bf1f 0x5606a6b2c1cc 0x5606a6aae8a9 0x5606a6b21618 0x5606a6aea8fe 0x5606a6aea9eb 0x5606a6a9ce6a 0x5606a5dd1685 0x5606a5dc99ca 0x5606a5caf50a 0x5606a5320bab 0x7f3d3c9ab505 0x5606a5384cff
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"5606A4CD5000","o":"1420D6B","s":"_ZN5mongo15printStackTraceERSo"},{"b":"5606A4CD5000","o":"141FFE6"},{"b":"5606A4CD5000","o":"14204C3"},{"b":"7F3D3CD57000","o":"F5F0"},{"b":"7F3D3C989000","o":"36337","s":"gsignal"},{"b":"7F3D3C989000","o":"37A28","s":"abort"},{"b":"5606A4CD5000","o":"63F887","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"5606A4CD5000","o":"1117597"},{"b":"5606A4CD5000","o":"649E75","s":"__wt_eventv"},{"b":"5606A4CD5000","o":"64A08C","s":"__wt_err"},{"b":"5606A4CD5000","o":"64A2F5","s":"__wt_panic"},{"b":"5606A4CD5000","o":"1E179D7","s":"__wt_turtle_read"},{"b":"5606A4CD5000","o":"1E161D9","s":"__wt_metadata_search"},{"b":"5606A4CD5000","o":"1DCB893","s":"__wt_conn_btree_open"},{"b":"5606A4CD5000","o":"1E56974","s":"__wt_session_get_btree"},{"b":"5606A4CD5000","o":"1E56F1F","s":"__wt_session_get_btree"},{"b":"5606A4CD5000","o":"1E571CC","s":"__wt_session_get_btree_ckpt"},{"b":"5606A4CD5000","o":"1DD98A9","s":"__wt_curfile_open"},{"b":"5606A4CD5000","o":"1E4C618"},{"b":"5606A4CD5000","o":"1E158FE","s":"__wt_metadata_cursor_open"},{"b":"5606A4CD5000","o":"1E159EB","s":"__wt_metadata_cursor"},{"b":"5606A4CD5000","o":"1DC7E6A","s":"wiredtiger_open"},{"b":"5606A4CD5000","o":"10FC685","s":"_ZN5mongo18WiredTigerKVEngineC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb"},{"b":"5606A4CD5000","o":"10F49CA"},{"b":"5606A4CD5000","o":"FDA50A","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"5606A4CD5000","o":"64BBAB","s":"main"},{"b":"7F3D3C989000","o":"22505","s":"__libc_start_main"},{"b":"5606A4CD5000","o":"6AFCFF"}],"processInfo":{ "mongodbVersion" : "3.4.10-4-g67ee356c6b", "gitVersion" : "67ee356c6be377cda547d16423daef5beb4e8377", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-1062.4.1.el7.x86_64", "version" : "#1 SMP Fri Oct 18 17:15:30 UTC 2019", "machine" : "x86_64" }, "somap" : [ { "b" : "5606A4CD5000", "elfType" : 3, "buildId" : "A14732EAAA95508462633A5030B501E08B1B106F" }, { "b" : "7FFEC46E6000", "elfType" : 3, "buildId" : "7F945B2385125E0B9A6C187543A8C96F8F9400C0" }, { "b" : "7F3D3DF0E000", "path" : "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/lib/libpython2.7.so.1.0", "elfType" : 3 }, { "b" : "7F3D3DC58000", "path" : "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/bin/../lib/libssl.so.1.0.0", "elfType" : 3 }, { "b" : "7F3D3D817000", "path" : "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/bin/../lib/libcrypto.so.1.0.0", "elfType" : 3 }, { "b" : "7F3D3D5ED000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "4749697BF078337576C4629F0D30B296A0939779" }, { "b" : "7F3D3D3E9000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "18113E6E83D8E981B8E8D808F7F3DBB23F950A1D" }, { "b" : "7F3D3D275000", "path" : "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/bin/../lib/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F3D3CF73000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5681C054FDABCF789F4DDA66E94F1F6ED1747327" }, { "b" : "7F3D3DEF7000", "path" : "/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_master/deps/anaconda/bin/../lib/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F3D3CD57000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "8B33F7F8C86F8D544C63C5541A8E42B3DDFEF8B1" }, { "b" : "7F3D3C989000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "398944D32CF16A67AF51067A326E6C0CC14F90ED" }, { "b" : "7F3D3C786000", "path" : "/lib64/libutil.so.1", "elfType" : 3, "buildId" : "E0D39E293DC99997E7B4C9B6203301E6CD904B50" }, { "b" : "7F3D3DECE000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5CC1A53B747A7E4D21198723C2B633E54F3C06D9" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x3B) [0x5606a60f5d6b]
 mongod(+0x141FFE6) [0x5606a60f4fe6]
 mongod(+0x14204C3) [0x5606a60f54c3]
 libpthread.so.0(+0xF5F0) [0x7f3d3cd665f0]
 libc.so.6(gsignal+0x37) [0x7f3d3c9bf337]
 libc.so.6(abort+0x148) [0x7f3d3c9c0a28]
 mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x5606a5314887]
 mongod(+0x1117597) [0x5606a5dec597]
 mongod(__wt_eventv+0x3E4) [0x5606a531ee75]
 mongod(__wt_err+0xA0) [0x5606a531f08c]
 mongod(__wt_panic+0x2F) [0x5606a531f2f5]
 mongod(__wt_turtle_read+0x227) [0x5606a6aec9d7]
 mongod(__wt_metadata_search+0x99) [0x5606a6aeb1d9]
 mongod(__wt_conn_btree_open+0x73) [0x5606a6aa0893]
 mongod(__wt_session_get_btree+0xE4) [0x5606a6b2b974]
 mongod(__wt_session_get_btree+0x68F) [0x5606a6b2bf1f]
 mongod(__wt_session_get_btree_ckpt+0x14C) [0x5606a6b2c1cc]
 mongod(__wt_curfile_open+0x169) [0x5606a6aae8a9]
 mongod(+0x1E4C618) [0x5606a6b21618]
 mongod(__wt_metadata_cursor_open+0x6E) [0x5606a6aea8fe]
 mongod(__wt_metadata_cursor+0x4B) [0x5606a6aea9eb]
 mongod(wiredtiger_open+0x183A) [0x5606a6a9ce6a]
 mongod(_ZN5mongo18WiredTigerKVEngineC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb+0x815) [0x5606a5dd1685]
 mongod(+0x10F49CA) [0x5606a5dc99ca]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x69A) [0x5606a5caf50a]
 mongod(main+0xE9B) [0x5606a5320bab]
 libc.so.6(__libc_start_main+0xF5) [0x7f3d3c9ab505]
 mongod(+0x6AFCFF) [0x5606a5384cff]
-----  END BACKTRACE  -----

I did crop the log to just the first round of the backtrace error. If more of the log is needed I can add it. I just ran out of characters if the whole log was posted

Thanks for any help!

Bryan


#2

Hi @hansenbry,

Unfortunately we haven’t come across an error like this before. Do you know if there were any changes to your filesystem configuration around the time you received this error?

It seems like the main error within the backtrace is Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361. Our best guess is to try searching for this assertion error in the context of MongoDB. WiredTiger is MongoDB’s default storage engine, so all problems seem to point to some sort of storage or filesystem malfunction.

Please let us know if you have any additional questions, we’ll try our best to help resolve the issue.

Regards,
Suhail


#3

Thanks for the suggestiong @sdawood! I’ll pass this along to our IT group here. This was my feeling too, but wanted to confirm.


#4

Hi @hansenbry, please let us know if there has been any progress on this. The logs also mention that the file

/net/ai-rmlhpc/gs1/RTS/EM/Software/Proc1-cryosparc/cryosparc2_database/mongod.lock

exists, which indicates an unclean shutdown. You can stop cryosparc, remove the .lock file and try to start cryosparc again, which often fixes unclean shutdown issues.


#5

hi @apunjani - We’re working to restore the cryosparc2_database folder from a daily backup that we have before the initial issue started. I did however try to do a cryosparm stop then remove the mongod.lock file and restart, but I still get the database spawn error and the mongod.lock file get recreated.


#6

Hi @hansenbry,

This might be caused by the following series of events:

  1. The initial sock file goes missing (this can happen if the /tmp directory is cleared
  2. The original mongo process is still running, but supervisor doesn’t know that its running because there is no sock file
  3. The backup/restore commands use supervisor to start the database if its not already running
  4. Supervisor reports to the backup/restore commands that the mongo process is not running
  5. The backup/restore commands turn on the database using supervisor, creating a new sock file
  6. The mongo database throws exceptions because it has two processes trying to acquire a lock on the db

Please kill the mongo processes, delete the lock files, and restart cryoSPARC.

ps -ax | grep “mongod”
kill <process_pid>
//delete the .lock file
cryosparcm start


#7

Hi @sarulthasan I got the database spawn error to clear up. However, now when I try and login to the UI it says ‘User not found’. Did something get majorly mucked up to where I would be better re-installing?