SLURM - multiple jobs in a single node

heejongkim · August 12, 2021, 3:21pm

Hello,

I would like to switch the queue system from cryoSPARC’s internal one to SLURM to consolidate the queue from different softwares in the cluster. Normal SLURM setup has been successful under many different situations but I would like to setup multiple job concurrent running in a single node.

By tweaking SLURM configuration, I was able to submit multiple jobs to run concurrently in SLURM via cryoSPARC. However, the issue is that cryoSPARC doesn’t recognize the current running jobs in SLURM and keeps assigning the very first resources.

For example, if I submit ab-initio modeling job three times, all three gets GPU 0, and CPU 0,1, and RAM 0 unlike cryoSPARC’s internal queue system would distribute them across gpu 0,1,2 and so on.
If I don’t have those tweaks in SLURM, those three jobs will be distributed into 3 nodes with only one GPU each, which is quite waste and delay for other jobs to run.

Will it be possible to accomplish under the current cryoSPARC configuration somehow?

Thank you.

best,
heejong Kim

stephan · August 13, 2021, 3:45pm

Hi @heejongkim,

This would have to be configured in your SLURM cluster itself. Our example cluster_script.sh provides a little loop that sets the CUDA_VISIBLE_DEVICES environment variable that allows SLURM to manage GPUs- can you ensure this is in your script?

https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/cryosparc-cluster-integration-script-examples