Running CryoSPARC locally without queueing SLURM jobs

pedro · June 19, 2025, 5:47am

Hi,

I am running CryoSPARC 4.5.3 (we would need to update it) on a cluster supporting SLURM job scheduler. Submission of CryoSPARC jobs through the option “Queue to Lane” works well. Besides submitting jobs to the queue, we now support logging in to the compute nodes, where users can start a CryoSPARC session and utilise GPUs available on that node. With the “Queue to Lane” option, they wouldn’t use the resources on that node, but instead would queue the job to another compute node.

I read that one could use the option “Run on specific GPU”, but in this case, the button “Queue” is not active. I think CryoSPARC is aware of the GPU cards on that node, because if I execute “cryosparcw gpulist”, I see the correct output related to the H100 cards on that compute node.

Something that has worked up to now is typing on a terminal of the compute node the commands for CryoSPARC executable and the name of the job (https://guide.cryosparc.com/processing-data/get-started-with-cryosparc-introductory-tutorial#step-3-download-the-tutorial-dataset for Patch motion correction step):

$CRYOSPARC_WORKER_BIN_PATH run --project P3 --job J3 --master_hostname compute.node --master_command_core_port 38000

the job is then caught by the session in the web browser on the compute node. Although this works, I wanted to know if you have a better and cleaner solution, maybe by tweaking some variables and getting the “Run on specific GPU” option working on the GUI. Or maybe there is another way to work on-site with the resources (CPU/GPU) of the compute nodes?

wtempel · June 20, 2025, 3:10pm

Welcome to the forum @pedro

How do users “start a CryoSPARC session”? Do users start a CryoSPARC (master) instance on the compute node?

pedro · June 21, 2025, 6:53am

Initially, users could only log into the so-called “login nodes” where they could start CryoSPARC on a terminal with “cryosparc start” and they could submit jobs to the SLURM queue, but they didn’t have access to GPUs on the login nodes and also they shouldn’t run expensive jobs as others used these nodes only to connect to the cluster.

Now, we support Open onDemand (OoD), where users can connect to the compute nodes, and here they have access to Linux terminal, a graphical environment, CPU/GPUs, … Using an OoD session, one starts CryoSPARC as usual on a terminal and catch the session on a browser that runs on the compute node. In this case, it doesn’t make sense to Queue the job to an existing lane because the node where CryoSPARC is running already has GPUs. That’s why I wanted to see if there is any option to run CryoSPARC using local resources.

BTW, I tried another thing besides the one I mentioned in my first post (running $CRYOSPARC … on the terminal). I created a lane called “lane_local” and used this option:

“send_cmd_tpl” : “bash {{ command }}”,

in the cluster_info.json file. Then, in the cluster_script.sh I only write these lines (no #SBATCH options):

#!/usr/bin/env bash
{{ run_cmd }}

Then, in the CryoSPARC GUI I submit the job with the button “Queue” using the “local” lane. Although the simulation starts running and I see some progress, it fails with the error:

Error occurred while processing J1/imported/014280599065487012748_14sep05c_c_00003gr_00014sq_00005hl_00002es.frames.tif
Traceback (most recent call last):
File “/cryosparc_worker/cryosparc_compute/jobs/pipeline.py”, line 59, in exec
return self.process(item)
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 402, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 1559 terminating self.

Marking J1/imported/014280599065487012748_14sep05c_c_00003gr_00014sq_00005hl_00002es.frames.tif as incomplete and continuing…

/Pedro

wtempel · June 23, 2025, 6:06pm

@pedro Do I understand correctly that your targeted use case is similar to running a Single Workstation-type CryoSPARC instance as described here?