Jobs sporadically hanging at launch after update to 5.0.6

Since updating to version 5.0.6 on our cluster we have started to have some jobs hang at launch.

The jobs fail, but are stuck in a launch state with a slurm_load_jobs error and ServerSelectionTimeoutError.

Our IT group hasn’t been able to identify a problem.

Thank you for your help.

Event Log Error:

[2026-06-18 14:52:15.15] [CPU: 637.5 MB]Cluster job status update for Job P94-J323 (Heterogeneous Refinement) failed (41 status update request retries): 'squeue -j

10006470': Command failed (code 1)

Output: slurm_load_jobs error: Invalid job id specified

Error: 

Job Log Error:

================= CRYOSPARC =================
Project P94 Job J323
Master reichow-cs2.ohsu.edu Port 61000
===========================================================================
MAIN PROCESS PID 312104
========= updating job startup information at 2026-06-18 14:52:16.091007
================= CRYOSPARC =================
Project P94 Job J323
Master reichow-cs2.ohsu.edu Port 61000
===========================================================================
MAIN PROCESS PID 312104
Process Process-1:
Traceback (most recent call last):
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "cli/run.py", line 25, in cli.run.start_and_update_job_runtime_info
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/core/core.py", line 61, in startup
    self.mongo = get_pymongo_client(conf.mongo_db_name, conf)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/core/database_management.py", line 227, in get_pymongo_client
    assert client[database_name].list_collection_names() is not None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/database.py", line 1226, in list_collection_names
    return self._list_collection_names(session, filter, comment, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/database.py", line 1191, in _list_collection_names
    result["name"] for result in self._list_collections_helper(session=session, **kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/database.py", line 1138, in _list_collections_helper
    return self._client._retryable_read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/mongo_client.py", line 1863, in _retryable_read
    return self._retry_internal(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/_csot.py", line 119, in csot_wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/mongo_client.py", line 1830, in _retry_internal
    ).run()
      ^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/mongo_client.py", line 2554, in run
    return self._read() if self._is_read else self._write()
           ^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/mongo_client.py", line 2689, in _read
    self._server = self._get_server()
                   ^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/mongo_client.py", line 2645, in _get_server
    return self._client._select_server(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/mongo_client.py", line 1649, in _select_server
    server = topology.select_server(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 398, in select_server
    server = self._select_server(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 376, in _select_server
    servers = self.select_servers(
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 283, in select_servers
    server_descriptions = self._select_servers_loop(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/exacloud/gscratch/reichowlab/local/cryosparc/reichow-cs2.ohsu.edu/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 333, in _select_servers_loop
    raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: reichow-cs2.ohsu.edu:61001: timed out (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 60.0s, Topology Description: <TopologyDescription id: 6a346890e6e9394c6c892a09, topology_type: Single, servers: [<ServerDescription ('reichow-cs2.ohsu.edu', 61001) server_type: Unknown, rtt: None, error=NetworkTimeout('reichow-cs2.ohsu.edu:61001: timed out (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

Thanks @DXLee for this report.
The ServerSelectionTimeoutError may be the cause of slurm_load_jobs: Invalid job id specified. For analysis of the ServerSelectionTimeoutError
Please can you
post the outputs of the command (on the reichow-cs2 computer)

ps -eo user,pid,ppid,start,cmd | grep -e cryosparc_ -e mongo

and the commands on the cluster node where job P94.J323 failed

uname -a
curl reichow-cs2.ohsu.edu:61001

The relevant cluster node can be displayed by running the following command on reichow-cs2:

cryosparcm cli "api.jobs.find_one('P94', 'J323').instance_information.platform_node"

and send us:

  • the job report for job P94.J323
  • the tgz file created with the command
    cryosparcm snaplogs

I will send you a forum PM about the email address.