Connecting SLURM cluster without a shared filesystem

marinegor · May 31, 2023, 3:02pm

Hi everyone,

I have a following configuration:

cryosparc_master running on a local workstation (where I have sudo)
a university HPC cluster with wonderful GPUs (where I don’t have sudo)

I want to connect the HPC cluster as a separate lane for our local cryosparc. However, when I try using example cluster_info.json and cluster_script.sh, I miserably fail with something like:

subprocess.CalledProcessError: Command 'ssh cluster_account sbatch /data/cryosparc_projects/rendered_cluster_script_qsub.sh' returned non-zero exit status 1.

where the cluster_info.json looks like this:

{
    "name" : "hpccluster",
    "worker_bin_path" : "/real/path/to/cryosparcw",
    "cache_path" : "/real/path/to/ssd/cache",
    "send_cmd_tpl" : "ssh cluster_account {{ command }}",
    "qsub_cmd_tpl" : "sbatch {{ script_path_abs }}",
    "qstat_cmd_tpl" : "squeue -j {{ cluster_job_id }}",
    "qdel_cmd_tpl" : "scancel {{ cluster_job_id }}",
    "qinfo_cmd_tpl" : "sinfo"
}

My question is, is there a good way to make it work without a shared filesystem, or the requirements for the cluster are the same as for the any other worker node?

And, if we get to mount our local drive to the cluster, we obviously won’t have all the folders mounted where we have them on our local workstation. Is there a way to specify some worker-specific prefix for the projects directory?

ccgauvin94 · May 31, 2023, 3:21pm

My suspicion is that the answer is “no”, but I will chime in that for a while we were in a similar situation and the best solution I was able to find was to run CryoSPARC Master on the HPC login node as well, and then use project detach/attach to move projects between the two as needed.

What we did in the long term was work with our HPC folks to set it up properly with network storage available on the HPC and running a CryoSPARC Master in a VM that also has access to the job submission. We were able to add our workstations to that to have a setup that gives us both cluster access, and workstation access.

leetleyang · May 31, 2023, 4:49pm

Can the different project paths be consolidated under a symlink that is common to the two systems? A bit hack-y.

#workstation
$ pwd
/
$ ln -s local_path/to/data shared_path/to/symlink
$ ls -l shared_path/to/symlink
shared_path/to/symlink -> local_path/to/data

#cluster
$ pwd
/
$ ln -s cluster_path/to/data shared_path/to/symlink
$ ls -l shared_path/to/symlink
shared_path/to/symlink -> cluster_path/to/data

Cheers,
Yang

marinegor · May 31, 2023, 5:53pm

hm, this is something, thanks! I totally forgot about links.

although mounting would still be an issue, the exact mounting point can indeed be in a completely different place.

leetleyang · May 31, 2023, 6:35pm

More hack-iness. Not sure if it’s the prescribed method, but in theory, either cryosparc_worker/config.sh or cluster_script.sh–or your .rc (if it gets loaded?)–could be used to automate the process of mounting (and linking?) on the cluster. It’s a bit unwieldy but may work.

Cheers,
Yang