Jobs Failing Sporadically - pymongo.errors.ServerSelectionTimeoutError

Hello,

We have been having periodic job failures with ServerSelectTimeOutError recently. I have been working with our IT staff, but they have not been able to identify the cause.

Our CryoSPARC (4.7.1) is running on an HPC (Red Hat Linux 9.4).

The following error is from a 2D classification job, but the error has occurred on many job types. Any help troubleshooting this error would be greatly appreciated.

[2025-08-26 13:20:54.27]
[CPU:   5.08 GB]
Traceback (most recent call last):
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2306, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 136, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run
  File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 137, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run
  File "cryosparc_master/cryosparc_compute/jobs/class2D/newrun.py", line 670, in cryosparc_master.cryosparc_compute.jobs.class2D.newrun.class2D_engine_run.work
  File "cryosparc_master/cryosparc_compute/jobs/class2D/newrun.py", line 232, in cryosparc_master.cryosparc_compute.jobs.class2D.newrun.run_class_2D.progress
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 1826, in update_event_text
    db['events'].update_one({'_id':event_id},
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/collection.py", line 1077, in update_one
    self._update_retryable(
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/collection.py", line 872, in _update_retryable
    return self.__database.client._retryable_write(
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1575, in _retryable_write
    return self._retry_with_session(retryable, func, s, bulk, operation, operation_id)
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1461, in _retry_with_session
    return self._retry_internal(
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/_csot.py", line 108, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1507, in _retry_internal
    ).run()
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 2353, in run
    return self._read() if self._is_read else self._write()
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 2456, in _write
    self._server = self._get_server()
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 2439, in _get_server
    return self._client._select_server(
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1322, in _select_server
    server = topology.select_server(
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/topology.py", line 368, in select_server
    server = self._select_server(
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/topology.py", line 346, in _select_server
    servers = self.select_servers(
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/topology.py", line 253, in select_servers
    server_descriptions = self._select_servers_loop(
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/pymongo/topology.py", line 303, in _select_servers_loop
    raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: reichow-cs2.ohsu.edu:61001: timed out (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30.0s, Topology Description: <TopologyDescription id: 68ae13dc5334b0f9ab8a00bb, topology_type: Single, servers: [<ServerDescription ('reichow-cs2.ohsu.edu', 61001) server_type: Unknown, rtt: None, error=NetworkTimeout('reichow-cs2.ohsu.edu:61001: timed out (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

In case of this error, one may want to check:

  1. Is the CryoSPARC master server overloaded?
  2. Does the output of the command
    cryosparcm log database indicate any problems just before 025-08-26 13:20:54.27?
  3. Is access from the worker node to the CryoSPARC master server’s port 61001 blocked or intercepted, for example by a network security measure, such as a http proxy?

One may also allow more time before a database connection timeout by including in
cryosparc_worker/config.sh:

export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=60000

One may set values even higher then 60000 if increasing the timeout limit is helpful and needed.