Started new cryosparc data base, and now we cannot que jobs

Hello. We have cryosparc installed on a BEEFGS file system. We use node 8 (gpu008) in order to run jobs.

Recently, we had to migrate all of our data, including the cryosparc_database, from one location to another. One effect of this was that the database (and all back ups) became corrupted. Thus, we decided to start a new database. However, one issue that now arises is that we cannot que jobs on the new cryosparc database. When I go to try and run a job, it is impossible to press the “que” button.

Any help would be great appreciated, and I am more than happy to provide any information that I can.

Thank you.

Have you already registered any worker resources (aka scheduler lanes and scheduler targets). A new database initially does not include information about the compute resources that are available for jobs. Lanes and targets can be registered in the database by

  1. in the single workstation case, installing CryoSPARC with the
    --standalone option.
  2. or connecting worker node(s) after CryoSPARC installation.
  3. or connecting worker cluster(s) after CryoSPARC installation.

Currently registered resources can be displayed with the command

cryosparcm cli "get_scheduler_targets()"

No, we have not done that. I was not involved with the initial install so my knowledge is somewhat limited.

[cryosparc@gates cryosparc_worker_v4.1]$ cryosparcm cli "get_scheduler_targets()"

[]

The command returns nothing, I assume I have to proceed with these next steps.

You are correct; [] is the expected output before any workers are connected.

Thank you. That makes sense.

When I run:
[cryosparc@gates cryosparc_database]$ cryosparcm cluster connect

cluster_info.json file does not exist in current directory

I get this response. Do you have any idea what might be causing this?

Please review

  1. the overview to confirm that you intended installation type is Clusters and, if so
  2. documentation for the cryosparcm cluster connect command and the files required by that command.

Hi. Is there a way I can definitely confirm what type of installation I previously had installed? I assumed it was a cluster installation because we have a cluster. Is it possible that it is not?

There are different ways of running CryoSPARC on a cluster, some of which do not involve the
cryosparcm cluster connect command.

Will you alway use node 8, and no other nodes, to run CryoSPARC jobs?

Yes. We will always use node 8.

When I try and run cyrosparcw connect I get the following errors. I have tried using a couple of different IP addresses for the location of the master.

[cryosparc@gates cryosparc_worker]$ ./bin/cryosparcw connect \

--master gates.shapirolab.zi.columbia.edu \
--port 39000 \
--worker gpu008 \
--ssh cryosparc@gpu008 \
--gpus 0,1,2,3,4,5,6,7

CRYOSPARC CONNECT --------------------------------------------

Attempting to register worker gpu008 to command gates.shapirolab.zi.columbia.edu:39002
Connecting as unix user cryosparc
Will register using ssh string: cryosparc@gpu008
If this is incorrect, you should re-run this command with the flag --sshstr

Connected to master.

Current connected workers:

Worker will be registered with 64 CPUs.
Autodetecting available GPUs…
Traceback (most recent call last):
File “/cm/shared/apps/cryosparc/cryosparc_worker/cryosparc_compute/nvidia_smi_util.py”, line 49, in run_nvidia_smi_query
memory_use_info = output_to_list(subprocess.check_output(command.split(), stderr=subprocess.STDOUT))
File “/cm/shared/apps/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/subprocess.py”, line 421, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File “/cm/shared/apps/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/subprocess.py”, line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command ‘[‘nvidia-smi’, ‘–query-gpu=driver_version’, ‘–format=csv,noheader,nounits’]’ returned non-zero exit status 9.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/cm/shared/apps/cryosparc/cryosparc_worker/bin/connect.py”, line 233, in
gpu_devidxs = check_gpus()
File “/cm/shared/apps/cryosparc/cryosparc_worker/bin/connect.py”, line 95, in check_gpus
driver_version = get_driver_version()
File “/cm/shared/apps/cryosparc/cryosparc_worker/cryosparc_compute/nvidia_smi_util.py”, line 65, in get_driver_version
return run_nvidia_smi_query({“driver_version”: “driver_version”})[0][“driver_version”]
File “/cm/shared/apps/cryosparc/cryosparc_worker/cryosparc_compute/nvidia_smi_util.py”, line 55, in run_nvidia_smi_query
raise RuntimeError(
RuntimeError: command ‘[‘nvidia-smi’, ‘–query-gpu=driver_version’, ‘–format=csv,noheader,nounits’]’ returned with error (code 9): b"NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.\n\n"
[cryosparc@gates cryosparc_worker]$

What is the output of the command
nvidia-smi on the gpu008 computer?

see below:
[cryosparc@gpu008 ~]$ nvidia-smi
Mon Sep 15 12:14:45 2025
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A4000 On | 00000000:06:00.0 Off | 0 |
| 41% 38C P8 16W / 140W | 2MiB / 15352MiB | 0% E. Process |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA RTX A4000 On | 00000000:07:00.0 Off | 0 |
| 41% 34C P8 15W / 140W | 2MiB / 15352MiB | 0% E. Process |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 2 NVIDIA RTX A4000 On | 00000000:45:00.0 Off | 0 |
| 41% 33C P8 14W / 140W | 2MiB / 15352MiB | 0% E. Process |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 3 NVIDIA RTX A4000 On | 00000000:46:00.0 Off | 0 |
| 41% 31C P8 12W / 140W | 2MiB / 15352MiB | 0% E. Process |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 4 NVIDIA RTX A4000 On | 00000000:89:00.0 Off | 0 |
| 41% 35C P8 16W / 140W | 2MiB / 15352MiB | 0% E. Process |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 5 NVIDIA RTX A4000 On | 00000000:8A:00.0 Off | 0 |
| 41% 34C P8 14W / 140W | 2MiB / 15352MiB | 0% E. Process |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 6 NVIDIA RTX A4000 On | 00000000:C5:00.0 Off | 0 |
| 41% 36C P8 17W / 140W | 2MiB / 15352MiB | 0% E. Process |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 7 NVIDIA RTX A4000 On | 00000000:C6:00.0 Off | 0 |
| 41% 33C P8 17W / 140W | 2MiB / 15352MiB | 0% E. Process |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+
[cryosparc@gpu008 ~]$

I only now noticed that cryosparcw connect was run on the gates computer, whereas the command should be run on the worker that is to be connected to CryoSPARC, gpu008 in this case.
You should also ensure that running the command
ssh cryosparc@gpu008 on gates connects you to gpu008 without requiring a password.