CryoSPARC can't start after system crash

Hello,

Our CryoSPARC instance crashed due to the storage folder being full. We had had this issue in the past but it had solved itself when increasing storage allocation. This time however I get stuck on “Warning: Could not get database status (attempt 3/3) cryosparc”

fyi : There is another group using CryoSPARC on the cluster (GRUBER), we are Pnavarr1.

CryoSPARC instance information

Type : Cluster

(base) [agregor@curnagl cryosparc_master]$ cryosparcm status
CryoSPARC System master node installed at
/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master
Current cryoSPARC version: v4.6.2

CryoSPARC process status:

app                              STOPPED   Not started
app_api                          STOPPED   Not started
app_api_dev                      STOPPED   Not started
command_core                     STOPPED   Not started
command_rtp                      STOPPED   Not started
command_vis                      STOPPED   Not started
database                         STOPPED   Not started

License is valid

global config variables:
export CRYOSPARC_LICENSE_ID="XXXXXXXXXXX"
export CRYOSPARC_MASTER_HOSTNAME="curnagl"
export CRYOSPARC_DB_PATH="/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/database"
export CRYOSPARC_BASE_PORT=45031
export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000
export CRYOSPARC_INSECURE=false
export CRYOSPARC_DB_ENABLE_AUTH=true
export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10
export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000
export CRYOSPARC_PROJECT_DIR_PREFIX='CS-'
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_CLICK_WRAP=true
(base) [agregor@curnagl cryosparc_master]$ uname -a && free -g
Linux curnagl 5.14.0-427.37.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 13 12:41:50 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
               total        used        free      shared  buff/cache   available
Mem:             503         153          45           9         316         349
Swap:              0           0           0

CryoSPARC worker environment

(base) [agregor@curnagl ~] eval $(/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_worker/bin/cryosparcw env)
env | grep PATH
/sbin/ldconfig -p | grep -i cuda
uname -a
free -g
nvidia-smi
commands below only on CryoSPARC versions older than v4.4
which nvcc
nvcc --version
python -c "import pycuda.driver; print(pycuda.driver.get_version())"
STACK_20240704_MODULEPATH=/dcsrsoft/spack//20240704/spack/opt/modules/Core:/dcsrsoft/spack//20240704/spack/opt/modules/gcc/11.4.0
STACK_20241118_MODULEPATH=/dcsrsoft/spack//20241118/spack/opt/modules/Core:/dcsrsoft/spack//20241118/spack/opt/modules/gcc/12.3.0:/dcsrsoft/spack//20241118/spack/opt/modules/gcc/9.5.0
CRYOSPARC_PATH=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_worker/bin
__LMOD_REF_COUNT_MODULEPATH=/dcsrsoft/spack/20241118/spack/opt/modules/Core:1;/dcsrsoft/spack/20241118/spack/opt/modules/gcc/12.3.0:1;/dcsrsoft/spack/20241118/spack/opt/modules/gcc/9.5.0:1
MANPATH=/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/singularityce-4.1.0-mt3k5udjdeyhxtvkci4sgwuialkaln2j/share/man::
__LMOD_REF_COUNT_CMAKE_PREFIX_PATH=/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/singularityce-4.1.0-mt3k5udjdeyhxtvkci4sgwuialkaln2j:1
__LMOD_REF_COUNT_PATH=/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/cryptsetup-2.3.5-mge72p7wl35jtj3ejpgryy6xa6ujtmmt/sbin:1;/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/singularityce-4.1.0-mt3k5udjdeyhxtvkci4sgwuialkaln2j/bin:1;/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/bin:1;/work/FAC/FBM/DMF/pnavarr1/default/tools/miniconda/bin:1;/work/FAC/FBM/DMF/pnavarr1/default/tools/miniconda/condabin:1;/users/agregor/.local/bin:1;/users/agregor/bin:1;/usr/lpp/mmfs/bin:1;/usr/local/bin:1;/usr/bin:1;/usr/local/sbin:1;/usr/sbin:1;/dcsrsoft/bin:1
__LMOD_REF_COUNT_GOPATH=/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/singularityce-4.1.0-mt3k5udjdeyhxtvkci4sgwuialkaln2j:1
CMAKE_PREFIX_PATH=/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/singularityce-4.1.0-mt3k5udjdeyhxtvkci4sgwuialkaln2j
PYTHONPATH=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_worker
NUMBA_CUDA_INCLUDE_PATH=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include
STACK_20240303_MODULEPATH=/dcsrsoft/spack//20240303/spack/opt/modules/Core:/dcsrsoft/spack//20240303/spack/opt/modules/gcc/11.4.0
__LMOD_REF_COUNT_MANPATH=/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/singularityce-4.1.0-mt3k5udjdeyhxtvkci4sgwuialkaln2j/share/man:1;:1
LD_LIBRARY_PATH=/work/FAC/FBM/DMF/pnavarr1/default/tools/cuda/usr/local/cuda-12.6/lib64:
PATH=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_worker/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_worker/deps/anaconda/condabin:/work/FAC/FBM/DMF/pnavarr1/default/tools/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/CTFfind5/cisTEM:/work/FAC/FBM/DMF/pnavarr1/default/tools/cryocare/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/cuda/usr/local/cuda-12.6/bin:/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/cryptsetup-2.3.5-mge72p7wl35jtj3ejpgryy6xa6ujtmmt/sbin:/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/singularityce-4.1.0-mt3k5udjdeyhxtvkci4sgwuialkaln2j/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/miniconda/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/miniconda/condabin:/users/agregor/.local/bin:/users/agregor/bin:/usr/lpp/mmfs/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/dcsrsoft/bin
MODULEPATH=/dcsrsoft/spack/20241118/spack/opt/modules/Core:/dcsrsoft/spack/20241118/spack/opt/modules/gcc/12.3.0:/dcsrsoft/spack/20241118/spack/opt/modules/gcc/9.5.0
GOPATH=/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/singularityce-4.1.0-mt3k5udjdeyhxtvkci4sgwuialkaln2j
        libicudata.so.67 (libc6,x86-64) => /lib64/libicudata.so.67
        libcuda_wrapper.so.0 (libc6,x86-64) => /lib64/libcuda_wrapper.so.0
        libcuda_wrapper.so (libc6,x86-64) => /lib64/libcuda_wrapper.so
Linux curnagl 5.14.0-427.37.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 13 12:41:50 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
               total        used        free      shared  buff/cache   available
Mem:             503         153          45           9         316         349
Swap:              0           0           0
-bash: nvidia-smi: command not found
/usr/bin/which: no nvcc in (/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_worker/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_worker/deps/anaconda/condabin:/work/FAC/FBM/DMF/pnavarr1/default/tools/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/CTFfind5/cisTEM:/work/FAC/FBM/DMF/pnavarr1/default/tools/cryocare/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/cuda/usr/local/cuda-12.6/bin:/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/cryptsetup-2.3.5-mge72p7wl35jtj3ejpgryy6xa6ujtmmt/sbin:/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/singularityce-4.1.0-mt3k5udjdeyhxtvkci4sgwuialkaln2j/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/miniconda/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/miniconda/condabin:/users/agregor/.local/bin:/users/agregor/bin:/usr/lpp/mmfs/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/dcsrsoft/bin)
-bash: nvcc: command not found
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'pycuda'

in /run/database.log i see:

2025-03-14T10:34:08.469+0100 I CONTROL  [initandlisten] MongoDB starting : pid=1476778 port=45032 dbpath=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/database 64-bit host=curnagl
2025-03-14T10:34:08.482+0100 I CONTROL  [initandlisten] db version v3.6.23
2025-03-14T10:34:08.482+0100 I CONTROL  [initandlisten] git version: d352e6a4764659e0d0350ce77279de3c1f243e5c
2025-03-14T10:34:08.482+0100 I CONTROL  [initandlisten] allocator: tcmalloc
2025-03-14T10:34:08.482+0100 I CONTROL  [initandlisten] modules: none
2025-03-14T10:34:08.482+0100 I CONTROL  [initandlisten] build environment:
2025-03-14T10:34:08.482+0100 I CONTROL  [initandlisten]     distarch: x86_64
2025-03-14T10:34:08.482+0100 I CONTROL  [initandlisten]     target_arch: x86_64
2025-03-14T10:34:08.482+0100 I CONTROL  [initandlisten] options: { net: { port: 45032 }, replication: { oplogSizeMB: 64, replSet: "meteor" }, storage: { dbPath: "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/database" } }
2025-03-14T10:34:08.498+0100 W -        [initandlisten] Detected unclean shutdown - /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/database/mongod.lock is not empty.
2025-03-14T10:34:08.498+0100 E STORAGE  [initandlisten] Failed to set up listener: SocketException: Address already in use
2025-03-14T10:34:08.499+0100 I CONTROL  [initandlisten] now exiting
2025-03-14T10:34:08.499+0100 I CONTROL  [initandlisten] shutting down with code:48

I moved “mongod.lock” to another folder. Now after trying cryosparcm start I get:

Starting CryoSPARC System master process...
CryoSPARC is not already running.
configuring database...
Warning: Could not get database status (attempt 1/3)
Warning: Could not get database status (attempt 2/3)
Warning: Could not get database status (attempt 3/3)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/cryosparc_compute/database_management.py", line 47, in configure_mongo
    initialize_replica_set()
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/cryosparc_compute/database_management.py", line 84, in initialize_replica_set
    admin_db = try_get_pymongo_db(mongo_client)
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/cryosparc_compute/database_management.py", line 251, in try_get_pymongo_db
    admin_db.command(({'serverStatus': 1}))
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/_csot.py", line 108, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/database.py", line 893, in command
    with self.__client._conn_for_reads(read_preference, session, operation=command_name) as (
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1375, in _conn_for_reads
    server = self._select_server(read_preference, session, operation)
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1322, in _select_server
    server = topology.select_server(
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 368, in select_server
    server = self._select_server(
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 346, in _select_server
    servers = self.select_servers(
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 253, in select_servers
    server_descriptions = self._select_servers_loop(
  File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 303, in _select_servers_loop
    raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: localhost:45032: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 20.0s, Topology Description: <TopologyDescription id: 67d40426802ff195f5b95076, topology_type: Single, servers: [<ServerDescription ('localhost', 45032) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:45032: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
[2025-03-14T11:26:53+01:00] Error configuring database. Most recent database log lines:
2025-03-14T11:25:40.320+0100 I CONTROL  [initandlisten] git version: d352e6a4764659e0d0350ce77279de3c1f243e5c
2025-03-14T11:25:40.320+0100 I CONTROL  [initandlisten] allocator: tcmalloc
2025-03-14T11:25:40.320+0100 I CONTROL  [initandlisten] modules: none
2025-03-14T11:25:40.320+0100 I CONTROL  [initandlisten] build environment:
2025-03-14T11:25:40.320+0100 I CONTROL  [initandlisten]     distarch: x86_64
2025-03-14T11:25:40.320+0100 I CONTROL  [initandlisten]     target_arch: x86_64
2025-03-14T11:25:40.320+0100 I CONTROL  [initandlisten] options: { net: { port: 45032 }, replication: { oplogSizeMB: 64, replSet: "meteor" }, storage: { dbPath: "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/database" } }
2025-03-14T11:25:40.341+0100 E STORAGE  [initandlisten] Failed to set up listener: SocketException: Address already in use
2025-03-14T11:25:40.341+0100 I CONTROL  [initandlisten] now exiting
2025-03-14T11:25:40.341+0100 I CONTROL  [initandlisten] shutting down with code:48

From /run/database.log when the storage issue appeared :

2025-03-13T11:59:46.377+0100 I NETWORK  [conn1734] received client metadata from 10.203.101.85:41108 conn1734: { driver: { name: "PyMongo", version: "4.8.0" }, os: { type: "Linux", name: "Linux", architecture: "x86_64", version: "5.14.0-427.37.1.el9_4.x86_64" }, platform: "CPython 3.10.14.final.0" }
2025-03-13T11:59:46.377+0100 I NETWORK  [conn1735] received client metadata from 10.203.101.85:41122 conn1735: { driver: { name: "PyMongo", version: "4.8.0" }, os: { type: "Linux", name: "Linux", architecture: "x86_64", version: "5.14.0-427.37.1.el9_4.x86_64" }, platform: "CPython 3.10.14.final.0" }
2025-03-13T11:59:46.382+0100 I ACCESS   [conn1735] Successfully authenticated as principal cryosparc_user on admin from client 10.203.101.85:41122
2025-03-13T11:59:46.382+0100 I ACCESS   [conn1734] Successfully authenticated as principal cryosparc_user on admin from client 10.203.101.85:41108
2025-03-13T12:00:00.334+0100 I STORAGE  [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 3ms
2025-03-13T12:02:27.724+0100 I STORAGE  [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 2ms
2025-03-13T12:05:13.765+0100 I STORAGE  [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 1ms
2025-03-13T12:08:34.152+0100 I STORAGE  [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 2ms
2025-03-13T12:10:11.816+0100 E STORAGE  [WTCheckpointThread] WiredTiger error (122) [1741864211:685065][2569531:0x7fcb59dbd640], file:collection-115-2422639577907128585.wt, WT_SESSION.checkpoint: __posix_file_write, 579: /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/database/collection-115-2422639577907128585.wt: handle-write: pwrite: failed to write 278528 bytes at offset 4567437312: Disk quota exceeded Raw: [1741864211:685065][2569531:0x7fcb59dbd640], file:collection-115-2422639577907128585.wt, WT_SESSION.checkpoint: __posix_file_write, 579: /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/database/collection-115-2422639577907128585.wt: handle-write: pwrite: failed to write 278528 bytes at offset 4567437312: Disk quota exceeded
2025-03-13T12:10:11.848+0100 E STORAGE  [WTCheckpointThread] WiredTiger error (22) [1741864211:848416][2569531:0x7fcb59dbd640], file:index-116-2422639577907128585.wt, WT_SESSION.checkpoint: __wt_block_checkpoint_resolve, 859: index-116-2422639577907128585.wt: the checkpoint failed, the system must restart: Invalid argument Raw: [1741864211:848416][2569531:0x7fcb59dbd640], file:index-116-2422639577907128585.wt, WT_SESSION.checkpoint: __wt_block_checkpoint_resolve, 859: index-116-2422639577907128585.wt: the checkpoint failed, the system must restart: Invalid argument
2025-03-13T12:10:11.848+0100 E STORAGE  [WTCheckpointThread] WiredTiger error (-31804) [1741864211:848455][2569531:0x7fcb59dbd640], file:index-116-2422639577907128585.wt, WT_SESSION.checkpoint: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1741864211:848455][2569531:0x7fcb59dbd640], file:index-116-2422639577907128585.wt, WT_SESSION.checkpoint: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic
2025-03-13T12:10:11.848+0100 F -        [WTCheckpointThread] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 420
2025-03-13T12:10:11.848+0100 F -        [WTCheckpointThread] \n\n***aborting after fassert() failure\n\n
2025-03-13T12:10:11.875+0100 F -        [WTJournalFlusher] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 74
2025-03-13T12:10:11.875+0100 F -        [WTJournalFlusher] \n\n***aborting after fassert() failure\n\n
2025-03-13T12:10:11.948+0100 F -        [conn73] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 74
2025-03-13T12:10:11.948+0100 F -        [conn73] \n\n***aborting after fassert() failure\n\n
2025-03-13T12:10:11.989+0100 F -        [conn1729] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 74
2025-03-13T12:10:11.989+0100 F -        [conn1729] \n\n***aborting after fassert() failure\n\n
2025-03-13T12:10:11.998+0100 F -        [conn7] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 74
2025-03-13T12:10:11.998+0100 F -        [conn7] \n\n***aborting after fassert() failure\n\n
2025-03-13T12:10:12.000+0100 F -        [ftdc] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 74
2025-03-13T12:10:12.000+0100 F -        [ftdc] \n\n***aborting after fassert() failure\n\n
2025-03-13T12:10:12.054+0100 F -        [conn1708] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 74
2025-03-13T12:10:12.054+0100 F -        [conn1708] \n\n***aborting after fassert() failure\n\n
2025-03-13T12:10:12.166+0100 F -        [conn1683] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 74
2025-03-13T12:10:12.166+0100 F -        [conn1683] \n\n***aborting after fassert() failure\n\n
2025-03-13T12:10:12.166+0100 F -        [conn1686] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 74
2025-03-13T12:10:12.166+0100 F -        [conn1686] \n\n***aborting after fassert() failure\n\n
2025-03-13T12:10:12.292+0100 F -        [WTCheckpointThread] Got signal: 6 (Aborted).

 0x55a4f763ef21 0x55a4f763e139 0x55a4f763e61d 0x7fcb5fe126f0 0x7fcb5fe5f94c 0x7fcb5fe12646 0x7fcb5fdfc7f3 0x55a4f5d22dec 0x55a4f5dfdd76 0x55a4f5e6fad1 0x55a4f5cbfa94 0x55a4f5cbfeb4 0x55a4f5f42695 0x55a4f5e31eb2 0x55a4f5e82b0e 0x55a4f5e83953 0x55a4f5e68f8a 0x55a4f5de0193 0x55a4f75287c0 0x55a4f774fc10 0x7fcb5fe5dc02 0x7fcb5fee2c40
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55A4F5399000","o":"22A5F21","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55A4F5399000","o":"22A5139"},{"b":"55A4F5399000","o":"22A561D"},{"b":"7FCB5FDD4000","o":"3E6F0"},{"b":"7FCB5FDD4000","o":"8B94C"},{"b":"7FCB5FDD4000","o":"3E646","s":"raise"},{"b":"7FCB5FDD4000","o":"287F3","s":"abort"},{"b":"55A4F5399000","o":"989DEC","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"55A4F5399000","o":"A64D76"},{"b":"55A4F5399000","o":"AD6AD1"},{"b":"55A4F5399000","o":"926A94","s":"__wt_err_func"},{"b":"55A4F5399000","o":"926EB4","s":"__wt_panic"},{"b":"55A4F5399000","o":"BA9695","s":"__wt_block_checkpoint_resolve"},{"b":"55A4F5399000","o":"A98EB2","s":"__wt_meta_track_off"},{"b":"55A4F5399000","o":"AE9B0E"},{"b":"55A4F5399000","o":"AEA953","s":"__wt_txn_checkpoint"},{"b":"55A4F5399000","o":"ACFF8A"},{"b":"55A4F5399000","o":"A47193","s":"_ZN5mongo18WiredTigerKVEngine26WiredTigerCheckpointThread3runEv"},{"b":"55A4F5399000","o":"218F7C0","s":"_ZN5mongo13BackgroundJob7jobBodyEv"},{"b":"55A4F5399000","o":"23B6C10"},{"b":"7FCB5FDD4000","o":"89C02"},{"b":"7FCB5FDD4000","o":"10EC40"}],"processInfo":{ "mongodbVersion" : "3.6.23", "gitVersion" : "d352e6a4764659e0d0350ce77279de3c1f243e5c", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "5.14.0-427.37.1.el9_4.x86_64", "version" : "#1 SMP PREEMPT_DYNAMIC Fri Sep 13 12:41:50 EDT 2024", "machine" : "x86_64" }, "somap" : [ { "b" : "55A4F5399000", "elfType" : 3, "buildId" : "B0818C001F2B63D4533D208F68F08AE2A599CA9E" }, { "b" : "7FFE363FC000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "B78F3F86198BFC7FBE33898DEDE69799CBE8530D" }, { "b" : "7FCB60101000", "path" : "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/libpython3.10.so", "elfType" : 3 }, { "b" : "7FCB600E4000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "8E61C4327C4D5757C08D7FA962EFD71683EBBFDC" }, { "b" : "7FCB600DF000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "1A11E03063E9160803AB6A87CECE2AD25346F20F" }, { "b" : "7FCB600DA000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "CE96631C1B7EA31412DA4D6E3C735BBCEE781C9D" }, { "b" : "7FCB5FFFF000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "BCE2C9260F603AB2C12D8EE28632B13D43C8AE61" }, { "b" : "7FCB5FFE2000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "EF4C928F1372AD155FEA761F0E840ECD264FB153" }, { "b" : "7FCB5FFDD000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "9A19B839A71005671BC715C96CB4FB040B4649E9" }, { "b" : "7FCB5FDD4000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8C3B90B6DFAC32E7E7DA24C75B450EF3BE7D48DA" }, { "b" : "7FCB604A0000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "A42D94EABFE701AC16B767E5971B6D08FCB01DF8" }, { "b" : "7FCB5FDCF000", "path" : "/lib64/libutil.so.1", "elfType" : 3, "buildId" : "C231C69EC0248CD17D9B3A1E2883A0386DDA53FB" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55a4f763ef21]
 mongod(+0x22A5139) [0x55a4f763e139]
 mongod(+0x22A561D) [0x55a4f763e61d]
 libc.so.6(+0x3E6F0) [0x7fcb5fe126f0]
 libc.so.6(+0x8B94C) [0x7fcb5fe5f94c]
 libc.so.6(raise+0x16) [0x7fcb5fe12646]
 libc.so.6(abort+0xD3) [0x7fcb5fdfc7f3]
 mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x55a4f5d22dec]
 mongod(+0xA64D76) [0x55a4f5dfdd76]
 mongod(+0xAD6AD1) [0x55a4f5e6fad1]
 mongod(__wt_err_func+0x90) [0x55a4f5cbfa94]
 mongod(__wt_panic+0x3F) [0x55a4f5cbfeb4]
 mongod(__wt_block_checkpoint_resolve+0x145) [0x55a4f5f42695]
 mongod(__wt_meta_track_off+0x312) [0x55a4f5e31eb2]
 mongod(+0xAE9B0E) [0x55a4f5e82b0e]
 mongod(__wt_txn_checkpoint+0x1C3) [0x55a4f5e83953]
 mongod(+0xACFF8A) [0x55a4f5e68f8a]
 mongod(_ZN5mongo18WiredTigerKVEngine26WiredTigerCheckpointThread3runEv+0x243) [0x55a4f5de0193]
 mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x160) [0x55a4f75287c0]
 mongod(+0x23B6C10) [0x55a4f774fc10]
 libc.so.6(+0x89C02) [0x7fcb5fe5dc02]
 libc.so.6(+0x10EC40) [0x7fcb5fee2c40]
-----  END BACKTRACE  -----

The output of “ps ax -U $(whoami) | grep mongod” (CRYOEM_GRUBER IS ANOTHER GROUP ON THE CLUSTER, IT IS NOT MY CRYOSPARC INSTALL. WE ARE PNAVARR1) :

(base) [agregor@curnagl cryosparc_master] ps ax -U (whoami) | grep mongod
 478581 ?        Sl     2:20 mongod --auth --dbpath /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/database --port 45002 --oplogSize 64 --replSet meteor --wiredTigerCacheSizeGB 4 --bind_ip_all
1871659 pts/307  S+     0:00 grep --color=auto mongod

Other debugging outputs :

(base) [agregor@curnagl cryosparc_master] whoami
agregor
(base) [agregor@curnagl cryosparc_master] stat bin/cryosparcm
  File: bin/cryosparcm
  Size: 76852           Blocks: 160        IO Block: 4194304 regular file
Device: 31h/49d Inode: 212937732   Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (225181/ agregor)   Gid: (183921/pi_pnavarr1_101419-pr-g)
Access: 2025-03-14 10:29:46.130245254 +0100
Modify: 2024-11-18 16:19:01.000000000 +0100
Change: 2025-02-04 15:41:32.642020992 +0100
 Birth: -
(base) [agregor@curnagl cryosparc_master] hostname
curnagl
(base) [agregor@curnagl cryosparc_master] grep curnagl config.sh
export CRYOSPARC_MASTER_HOSTNAME="curnagl"
(base) [agregor@curnagl cryosparc_master] ps xww | grep -e cryosparc -e mongo
 660108 ?        Sl    17:41 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:45034 -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
1851839 ?        Ss     0:00 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/supervisord.conf
1938537 pts/307  S+     0:00 grep --color=auto -e cryosparc -e mongo
2569007 ?        Ss     5:00 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/supervisord.conf
2569835 ?        S      2:23 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:45033 cryosparc_command.command_core:start() -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
2569985 ?        Sl    71:03 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:45033 cryosparc_command.command_core:start() -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
2570302 ?        S      2:21 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:45034 -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
2570344 ?        S      2:10 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:45036 -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
2570355 ?        Sl    68:05 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:45036 -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
2570880 ?        Sl    87:21 /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js
3886350 ?        Ss     0:08 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/supervisord.conf

Also

(base) [agregor@curnagl cryosparc_master]$ curl localhost:45032
curl: (7) Failed to connect to localhost port 45032: Connection refused

And

(base) [agregor@curnagl cryosparc_master]$ cryosparcm log webapp
Invalid service: webapp
Usage:
    cryosparcm log SERVICE
Where SERVICE is one of:
    app
    app_api
    command_core
    command_rtp
    command_vis
    database
    supervisord

I appreciate any and all help. Thank you very much.

Best,
Aurélien.

1 Like

And

(base) [agregor@curnagl cryosparc_master]$ cryosparcm log command_core | tail -n 40
cryosparcm log supervisord | tail -n 40
cryosparcm env | grep -v LICENSE_ID
2025-03-14 12:15:26,619 background_worker    ERROR    | pymongo.errors.ServerSelectionTimeoutError: Could not reach any servers in [('localhost', 45032)]. Replica set is configured with internal hostnames or IPs?, Timeout: 30s, Topology Description: <TopologyDescription id: 67b5ab7d84ec8a796584656d, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('localhost', 45032) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:45032: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
2025-03-14 12:15:56,682 background_worker    ERROR    | Cluster job monitor error
2025-03-14 12:15:56,682 background_worker    ERROR    | Traceback (most recent call last):
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/cryosparc_command/command_core/__init__.py", line 194, in background_worker
2025-03-14 12:15:56,682 background_worker    ERROR    |     cluster_job_monitor()
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/cryosparc_command/command_core/__init__.py", line 9168, in cluster_job_monitor
2025-03-14 12:15:56,682 background_worker    ERROR    |     for job in jobs:
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/cursor.py", line 1243, in next
2025-03-14 12:15:56,682 background_worker    ERROR    |     if len(self.__data) or self._refresh():
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/cursor.py", line 1160, in _refresh
2025-03-14 12:15:56,682 background_worker    ERROR    |     self.__send_message(q)
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/cursor.py", line 1039, in __send_message
2025-03-14 12:15:56,682 background_worker    ERROR    |     response = client._run_operation(
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/_csot.py", line 108, in csot_wrapper
2025-03-14 12:15:56,682 background_worker    ERROR    |     return func(self, *args, **kwargs)
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1431, in _run_operation
2025-03-14 12:15:56,682 background_worker    ERROR    |     return self._retryable_read(
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1540, in _retryable_read
2025-03-14 12:15:56,682 background_worker    ERROR    |     return self._retry_internal(
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/_csot.py", line 108, in csot_wrapper
2025-03-14 12:15:56,682 background_worker    ERROR    |     return func(self, *args, **kwargs)
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1507, in _retry_internal
2025-03-14 12:15:56,682 background_worker    ERROR    |     ).run()
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 2353, in run
2025-03-14 12:15:56,682 background_worker    ERROR    |     return self._read() if self._is_read else self._write()
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 2483, in _read
2025-03-14 12:15:56,682 background_worker    ERROR    |     self._server = self._get_server()
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 2439, in _get_server
2025-03-14 12:15:56,682 background_worker    ERROR    |     return self._client._select_server(
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1322, in _select_server
2025-03-14 12:15:56,682 background_worker    ERROR    |     server = topology.select_server(
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 368, in select_server
2025-03-14 12:15:56,682 background_worker    ERROR    |     server = self._select_server(
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 346, in _select_server
2025-03-14 12:15:56,682 background_worker    ERROR    |     servers = self.select_servers(
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 253, in select_servers
2025-03-14 12:15:56,682 background_worker    ERROR    |     server_descriptions = self._select_servers_loop(
2025-03-14 12:15:56,682 background_worker    ERROR    |   File "/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 303, in _select_servers_loop
2025-03-14 12:15:56,682 background_worker    ERROR    |     raise ServerSelectionTimeoutError(
2025-03-14 12:15:56,682 background_worker    ERROR    | pymongo.errors.ServerSelectionTimeoutError: Could not reach any servers in [('localhost', 45032)]. Replica set is configured with internal hostnames or IPs?, Timeout: 30s, Topology Description: <TopologyDescription id: 67b5ab7d84ec8a796584656d, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('localhost', 45032) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:45032: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
2025-02-19 10:59:27,977 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-19 10:59:28,158 INFO spawned: 'command_rtp' with pid 2570344
2025-02-19 10:59:29,448 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-19 10:59:45,310 INFO spawned: 'app' with pid 2570830
2025-02-19 10:59:46,312 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-19 10:59:46,543 INFO spawned: 'app_api' with pid 2570880
2025-02-19 10:59:47,839 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-03-13 12:10:14,408 WARN exited: database (terminated by SIGABRT (core dumped); not expected)
2025-03-13 16:29:56,757 INFO RPC interface 'supervisor' initialized
2025-03-13 16:29:56,757 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-03-13 16:29:56,759 INFO daemonizing the supervisord process
2025-03-13 16:29:56,858 INFO supervisord started with pid 2855493
2025-03-13 16:38:50,122 INFO RPC interface 'supervisor' initialized
2025-03-13 16:38:50,122 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-03-13 16:38:50,124 INFO daemonizing the supervisord process
2025-03-13 16:38:50,126 INFO supervisord started with pid 2933714
2025-03-13 16:48:20,919 INFO RPC interface 'supervisor' initialized
2025-03-13 16:48:20,919 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-03-13 16:48:20,921 INFO daemonizing the supervisord process
2025-03-13 16:48:20,922 INFO supervisord started with pid 3156376
2025-03-13 17:22:42,704 INFO RPC interface 'supervisor' initialized
2025-03-13 17:22:42,704 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-03-13 17:22:42,706 INFO daemonizing the supervisord process
2025-03-13 17:22:42,708 INFO supervisord started with pid 3886350
2025-03-13 17:29:05,795 INFO RPC interface 'supervisor' initialized
2025-03-13 17:29:05,796 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-03-13 17:29:05,797 INFO daemonizing the supervisord process
2025-03-13 17:29:05,814 INFO supervisord started with pid 3928568
2025-03-14 10:33:59,618 INFO RPC interface 'supervisor' initialized
2025-03-14 10:33:59,618 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-03-14 10:33:59,619 INFO daemonizing the supervisord process
2025-03-14 10:33:59,677 INFO supervisord started with pid 1476034
2025-03-14 10:40:31,941 INFO RPC interface 'supervisor' initialized
2025-03-14 10:40:31,941 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-03-14 10:40:31,943 INFO daemonizing the supervisord process
2025-03-14 10:40:32,026 INFO supervisord started with pid 1524936
2025-03-14 11:25:32,221 INFO RPC interface 'supervisor' initialized
2025-03-14 11:25:32,221 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-03-14 11:25:32,223 INFO daemonizing the supervisord process
2025-03-14 11:25:32,226 INFO supervisord started with pid 1851839
export "CRYOSPARC_MASTER_HOSTNAME=curnagl"
export "CRYOSPARC_PATH=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/external/mongodb/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/bin"
export "CRYOSPARC_MONGO_EXTRA_FLAGS="
export "CRYOSPARC_INSECURE=false"
export "CRYOSPARC_DB_ENABLE_AUTH_FLAG=--auth"
export "CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000"
export "CRYOSPARC_MONGO_CACHE_GB=4"
export "CRYOSPARC_COMMAND_VIS_PORT=45034"
export "CRYOSPARC_MONGO_FCV=3.6"
export "CRYOSPARC_COMMAND_RTP_PORT=45036"
export "CRYOSPARC_HTTP_APP_PORT=45031"
export "CRYOSPARC_ROOT_DIR=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master"
export "CRYOSPARC_FORCE_USER=false"
export "CRYOSPARC_HOSTNAME_CHECK=curnagl"
export "CRYOSPARC_MONGO_PORT=45032"
export "CRYOSPARC_DB_PATH=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/database"
export "CRYOSPARC_CLICK_WRAP=true"
export "CRYOSPARC_HTTP_LIVEAPP_LEGACY_PORT=45037"
export "CRYOSPARC_LIVE_ENABLED=true"
export "CRYOSPARC_COMMAND_CORE_PORT=45033"
export "CRYOSPARC_SUPERVISOR_SOCK_FILE=/tmp/cryosparc-supervisor-a34cb30d28451fffcb5bc7a5ac8ec625.sock"
export "CRYOSPARC_BASE_PORT=45031"
export "CRYOSPARC_DEVELOP=false"
export "CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10"
export "CRYOSPARC_DB_ENABLE_AUTH=true"
export "CRYOSPARC_HEARTBEAT_SECONDS=180"
export "CRYOSPARC_CONDA_ENV=cryosparc_master_env"
export "CRYOSPARC_PROJECT_DIR_PREFIX=CS-"
export "CRYOSPARC_FORCE_HOSTNAME=false"
export "CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000"
export "PATH=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/external/mongodb/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/condabin:/work/FAC/FBM/DMF/pnavarr1/default/tools/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/CTFfind5/cisTEM:/work/FAC/FBM/DMF/pnavarr1/default/tools/cryocare/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/cuda/usr/local/cuda-12.6/bin:/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/cryptsetup-2.3.5-mge72p7wl35jtj3ejpgryy6xa6ujtmmt/sbin:/dcsrsoft/spack/20241118/spack/opt/spack/linux-rhel9-zen2/gcc-12.3.0/singularityce-4.1.0-mt3k5udjdeyhxtvkci4sgwuialkaln2j/bin:/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/miniconda/bin:/work/FAC/FBM/DMF/pnavarr1/default/tools/miniconda/condabin:/users/agregor/.local/bin:/users/agregor/bin:/usr/lpp/mmfs/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/dcsrsoft/bin"
export "LD_LIBRARY_PATH=/work/FAC/FBM/DMF/pnavarr1/default/tools/cuda/usr/local/cuda-12.6/lib64:"
export "LD_PRELOAD=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/libpython3.10.so"
export "PYTHONPATH=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master"
export "PYTHONNOUSERSITE=true"
export "CONDA_EXE=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/bin/conda"
export "CONDA_PREFIX=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env"
export "CONDA_PROMPT_MODIFIER=(cryosparc_master_env)"
export "CONDA_ENVS_DIRS=/work/FAC/FBM/DMF/pnavarr1/default/tools/conda-envs"
export "CONDA_SHLVL=1"
export "CONDA_DIR=/work/FAC/FBM/DMF/pnavarr1/default/tools/miniconda"
export "CONDA_PYTHON_EXE=/work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/bin/python"
export "CONDA_DEFAULT_ENV=cryosparc_master_env"
(base) [agregor@curnagl cryosparc_master]$

Also : (we are agregor/pnavarr1. sgruber1 is another group)

(base) [agregor@curnagl CryoSPARC]$ ps -eo user,pid,ppid,start,command | grep 'cryosparc'
yli18     477935       1 08:36:47 python /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/supervisord.conf
yli18     478581  477935 08:36:52 mongod --auth --dbpath /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/database --port 45002 --oplogSize 64 --replSet meteor --wiredTigerCacheSizeGB 4 --bind_ip_all
yli18     478993  477935 08:36:56 python /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:45003 cryosparc_command.command_core:start() -c /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/gunicorn.conf.py
yli18     479786  478993 08:37:01 python /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:45003 cryosparc_command.command_core:start() -c /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/gunicorn.conf.py
yli18     481546  477935 08:37:17 python /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:45004 -c /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/gunicorn.conf.py
yli18     481560  481546 08:37:17 python /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:45004 -c /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/gunicorn.conf.py
yli18     481800  477935 08:37:18 python /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:45006 -c /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/gunicorn.conf.py
yli18     481881  481800 08:37:18 python /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:45006 -c /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/gunicorn.conf.py
yli18     484440  477935 08:37:39 /work/FAC/FBM/DMF/sgruber1/cryoem_gruber/cryosparc/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js
agregor   660108 2570302   Feb 26 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:45034 -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
agregor  1851839       1 11:25:31 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/supervisord.conf
agregor  2551177 1504901 13:07:41 grep --color=auto cryosparc
agregor  2569007       1   Feb 19 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/supervisord.conf
agregor  2569835 2569007   Feb 19 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:45033 cryosparc_command.command_core:start() -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
agregor  2569985 2569835   Feb 19 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:45033 cryosparc_command.command_core:start() -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
agregor  2570302 2569007   Feb 19 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:45034 -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
agregor  2570344 2569007   Feb 19 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:45036 -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
agregor  2570355 2570344   Feb 19 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:45036 -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/gunicorn.conf.py
agregor  2570880 2569007   Feb 19 /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js
agregor  3886350       1 17:22:41 python /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /work/FAC/FBM/DMF/pnavarr1/default/CryoSPARC/cryosparc_master/supervisord.conf

Update:

Managed to solve this by:

  1. cryosparcm stop

  2. killing all the jobs that I could under

ps -eo user,pid,ppid,start,command | grep -e cryosparc_ -e mongo

  1. cryosparcm restart
1 Like

Welcome to the forum @AurelienGG and thanks for posting the resolution. Glad to learn that your database may have survived the storage depletion. Because storage depletion may lead to more severe damage to the database and recovery may be tedious, you may want to continuously monitor available capacity for the database volume/quota especially.

A caution for future visitors facing a similar situation: There are servers that run multiple CryoSPARC instances (subject to certain constraints). The ps command may therefore show processes belonging to different CryoSPARC instances. It is therefore important to ensure that one only terminates processes that belong to the specifically applicable CryoSPARC instance.

can we set a quota for the data base in master/worker config.sh ?

There is no CryoSPARC-specific setting that would protect the database from malfunction due to insufficient storage space. We recommend combining:

  1. continuous monitoring of storage capacity for the database volume
  2. and, after deletion of unneeded projects or jobs, compacting the database

to ensure the database never runs out of space.

1 Like