CryoSPARC timeout getting database status

CryoSPARC instance information

  • Type: cluster
  • Software version
  • $ uname -a && free -g
    Linux cryosparc-prod 3.10.0-1160.119.1.el7.tuxcare.els13.x86_64 #1 SMP Fri Nov 22 06:29:45 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
    total used free shared buff/cache available
    Mem: 46 1 41 0 4 45
    Swap: 1 0 1

###Issue

  • While attempting a ‘cryosparcm start’ we get the following error. There were no other user processes running, and we ran ‘cryosparcm start/stop’ commands using the account cryoSPARC was built with.
    :
[lab_name@cryosparc-prod bin]$ ./cryosparcm start
Starting CryoSPARC System master process...
CryoSPARC is not already running.
configuring database...
Warning: Could not get database status (attempt 1/3)
Warning: Could not get database status (attempt 2/3)
Warning: Could not get database status (attempt 3/3)
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/path/to/www/cryosparc/cryosparc2_master/cryosparc_compute/database_management.py", line 47, in configure_mongo
initialize_replica_set()
File "/path/to/www/cryosparc/cryosparc2_master/cryosparc_compute/database_management.py", line 84, in initialize_replica_set
admin_db = try_get_pymongo_db(mongo_client)
File "/path/to/www/cryosparc/cryosparc2_master/cryosparc_compute/database_management.py", line 251, in try_get_pymongo_db
admin_db.command(({'serverStatus': 1}))
File "/path/to/www/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/_csot.py", line 108, in csot_wrapper
return func(self, *args, **kwargs)
File "/path/to/www/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/database.py", line 893, in command
with self.__client._conn_for_reads(read_preference, session, operation=command_name) as (
File "/path/to/www/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1375, in _conn_for_reads
server = self._select_server(read_preference, session, operation)
File "/path/to/www/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1322, in _select_server
server = topology.select_server(
File "/path/to/www/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 368, in select_server
server = self._select_server(
File "/path/to/www/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 346, in _select_server
servers = self.select_servers(
File "/path/to/www/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 253, in select_servers
server_descriptions = self._select_servers_loop(
File "/path/to/www/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/site-packages/pymongo/topology.py", line 303, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: localhost:8622: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 20.0s, Topology Description: <TopologyDescription id: 678a62c896279ba04a16fbf1, topology_type: Single, servers: [<ServerDescription ('localhost', 8622) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:8622: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
[2025-01-17T09:02:55-0500] Error configuring database. Most recent database log lines:
2025-01-17T09:01:41.802-0500 I - [initandlisten] Detected data files in /path/to/db/cryosparc_db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2025-01-17T09:01:41.805-0500 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=23548M,cache_overflow=(file_max=0M),session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2025-01-17T09:01:42.601-0500 E STORAGE [initandlisten] WiredTiger error (11) [1737122502:601419][120298:0x7f9294e84a40], wiredtiger_open: __posix_file_lock, 410: /path/to/db/cryosparc_db/WiredTiger.lock: handle-lock: fcntl: Resource temporarily unavailable Raw: [1737122502:601419][120298:0x7f9294e84a40], wiredtiger_open: __posix_file_lock, 410: /path/to/db/cryosparc_db/WiredTiger.lock: handle-lock: fcntl: Resource temporarily unavailable
2025-01-17T09:01:42.601-0500 E STORAGE [initandlisten] WiredTiger error (16) [1737122502:601478][120298:0x7f9294e84a40], wiredtiger_open: __conn_single, 1720: WiredTiger database is already being managed by another process: Device or resource busy Raw: [1737122502:601478][120298:0x7f9294e84a40], wiredtiger_open: __conn_single, 1720: WiredTiger database is already being managed by another process: Device or resource busy
2025-01-17T09:01:42.601-0500 E - [initandlisten] Assertion: 28595:16: Device or resource busy src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 488
2025-01-17T09:01:42.602-0500 I STORAGE [initandlisten] exception in initAndListen: Location28595: 16: Device or resource busy, terminating
2025-01-17T09:01:42.603-0500 I NETWORK [initandlisten] shutdown: going to close listening sockets...
2025-01-17T09:01:42.603-0500 I NETWORK [initandlisten] removing socket file: /tmp/mongodb-8622.sock
2025-01-17T09:01:42.603-0500 I CONTROL [initandlisten] now exiting
2025-01-17T09:01:42.603-0500 I CONTROL [initandlisten] shutting down with code:100 
  • I want to include the following since we saw it was requested in a similar (but not the same) thread(s) found here :
grep -v LICENSE_ID /n/www/cryosparc-lab_name.cluster.school.edu/cryosparc2_master/config.sh
ps -eo user:12,pid,ppid,start,command | grep -e cryosparc_ -e mongo
ls -l /tmp/mongo*.sock /tmp/cryosparc*.sock /path/to/lab_name/db/cryosparc_db/WiredTiger.lock


export CRYOSPARC_MASTER_HOSTNAME="cryosparc-prod.cluster.school.edu"
export CRYOSPARC_DB_PATH="/path/to/lab_name/db/cryosparc_db"
export CRYOSPARC_BASE_PORT=8621
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
export CRYOSPARC_CLICK_WRAP=true
export CRYOSPARC_FORCE_HOSTNAME=true
export CRYOSPARC_SSD_PATH=/n/scratch/users/l/lab_name
lab_name 3171 3168 09:26:42 grep -e cryosparc_ -e mongo
-rw-rw-r-- 1 lab_name lab_name 21 Mar 30 2021 /path/to/lab_name/db/cryosparc_db/WiredTiger.lock
srwx------ 1 lab2_name lab2_name 0 Jan 10 13:23 /tmp/cryosparc-supervisor-50c2a9c444f39d0cdea59524ee190f8c.sock
srwx------ 1 lab3_name lab3_name 0 Jan 9 17:16 /tmp/cryosparc-supervisor-cd41cca6e6fba8d21b1c763548378f8e.sock
srwx------ 1 lab3_name lab3_name 0 Jan 9 17:16 /tmp/mongodb-8602.sock
srwx------ 1 lab2_name lab2_name 0 Jan 10 13:24 /tmp/mongodb-8672.sock
2025-01-09 17:16:20,162 INFO supervisord started with pid 8856
2025-01-09 17:16:34,023 INFO spawned: 'database' with pid 10453
2025-01-09 17:16:35,087 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-01-09 17:16:42,337 INFO spawned: 'command_core' with pid 11357
2025-01-09 17:16:48,300 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2025-01-09 17:17:00,906 INFO spawned: 'command_vis' with pid 11865
2025-01-09 17:17:01,915 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-01-09 17:17:02,712 INFO spawned: 'command_rtp' with pid 12050
2025-01-09 17:17:03,714 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-01-09 17:17:22,553 INFO spawned: 'app' with pid 12606
2025-01-09 17:17:23,567 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-01-09 17:17:25,657 INFO spawned: 'app_api' with pid 12708
2025-01-09 17:17:26,654 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-01-16 11:26:27,593 INFO RPC interface 'supervisor' initialized
2025-01-16 11:26:27,593 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-01-16 11:26:27,595 INFO daemonizing the supervisord process
2025-01-16 11:26:27,609 INFO supervisord started with pid 2419
2025-01-16 11:27:47,700 WARN received SIGTERM indicating exit request
2025-01-16 13:48:01,744 INFO RPC interface 'supervisor' initialized
2025-01-16 13:48:01,744 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-01-16 13:48:01,745 INFO daemonizing the supervisord process
2025-01-16 13:48:01,760 INFO supervisord started with pid 74475
2025-01-16 13:52:20,716 INFO RPC interface 'supervisor' initialized
2025-01-16 13:52:20,716 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-01-16 13:52:20,717 INFO daemonizing the supervisord process
2025-01-16 13:52:20,738 INFO supervisord started with pid 76000
2025-01-16 14:16:19,096 INFO RPC interface 'supervisor' initialized
2025-01-16 14:16:19,096 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-01-16 14:16:19,098 INFO daemonizing the supervisord process
2025-01-16 14:16:19,115 INFO supervisord started with pid 89894
2025-01-16 14:19:19,321 WARN received SIGTERM indicating exit request
2025-01-16 14:19:58,809 INFO RPC interface 'supervisor' initialized
2025-01-16 14:19:58,809 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-01-16 14:19:58,811 INFO daemonizing the supervisord process
2025-01-16 14:19:58,821 INFO supervisord started with pid 91193
2025-01-17 09:01:39,859 INFO RPC interface 'supervisor' initialized
2025-01-17 09:01:39,859 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2025-01-17 09:01:39,860 INFO daemonizing the supervisord process
2025-01-17 09:01:39,876 INFO supervisord started with pid 120269
2025-01-17 09:14:06,738 WARN received SIGTERM indicating exit request
1 Like

Welcome to the forum @calvin4cryo and thanks for posting the command outputs.
In the absence of any mongod process and the simultaneous presence of
/tmp/mongodb-*.sock files is unexpected and suggests the disruption of processes related to your and additional CryoSPARC instances running on the cryosparc-prod server.

  1. Are cryosparc master processes under control of the the cluster’s workload manager?
  2. Could cryosparc master processes have been sent SIGKILL, which should be avoided?
  3. What are the outputs of these commands:
    wtlock=/path/to/db/cryosparc_db/WiredTiger.lock # replace with actual path
    sudo lsof $wtlock
    ls -l $wtlock
    grep "$(df $wtlock | tail -n 1 | awk '{print $NF}') " /proc/mounts