Use existing IBM LSF cluster as worker nodes

cgirda · September 13, 2023, 7:55pm

Hi
We are trying to configure the Cryosparc 4.3.1 cluster

Configured Cryosparc scheduler on a standalone node.
We have an existing LSF cluster with common storage across nodes and GPU computes in it.
For that reason, configured the cluster_folder

$ cat cluster_info.json
{
“name”: “lsfcluster1”,
“title”: “lsfcluster”,
“send_cmd_tpl”: " {{ command }}",
“qsub_cmd_tpl”: “/opt/ibm/lsfsuite/lsf/10.1/linux2.6-glibc2.3-x86_64/bin/bsub < {{ script_path_abs }}”,
“qstat_cmd_tpl”: “/opt/ibm/lsfsuite/lsf/10.1/linux2.6-glibc2.3-x86_64/bin/bjobs -l {{ cluster_job_id }}”,
“qdel_cmd_tpl”: “/opt/ibm/lsfsuite/lsf/10.1/linux2.6-glibc2.3-x86_64/bin/bkill {{ cluster_job_id }}”,
“qinfo_cmd_tpl”: “/opt/ibm/lsfsuite/lsf/10.1/linux2.6-glibc2.3-x86_64/bin/bqueues”,
“cache_path”: “/tmp”,
“cache_quota_mb”: null,
“cache_reserve_mb”: 10000
}

$ cat cluster_script.sh
#!/bin/bash

#BSUB -J cryosparc_{{ project_uid }}{{ job_uid }}{{ cryosparc_username }}
#BSUB -G test-dev
#BSUB -q lsf_queue1
#BSUB -e {{ job_dir_abs }}/%J.err
#BSUB -oo {{ job_dir_abs }}/%J.out
#BSUB -n 1
#BSUB -R “span[ptile=1]”
#BSUB -R “rusage[mem={{ ram_gb }}]”
#BSUB -gpu “num=1:j_exclusive=yes:mode=shared”
#BSUB -W 36:00

Testing

Using the test data, when we run the job on the local master it work.
When cloning the same job to run on the cluster that showed on the web UI

The job says it is landing but don’t see anything on the compute cluster or any of the worker nodes on the compute cluster.

Questions

Do we need to specify anywhere on the cryosparc master on how to get to the “LSF scheduler”?
Do we need a common storage between the “Cryosparc Scheduler” and “LSF compute nodes”?
Do we need to make the “Cryosparc Master” part of the LSF cluster? so it know about the the cluster information?

Any help is appreciated.

Thank you
Chakri

wtempel · September 13, 2023, 8:14pm

Are you using the terms CryoSPARC scheduler and _CryoSPARC master interchangeably?

In the simplest case, the Linux account that runs CryoSPARC can submit an LSF job by running bsub on the computer that also runs cryosparc_master processes.

At minimum, the CryoSPARC master and LSF compute nodes should share

a Linux account that runs CryoSPARC processes, with matching user ids.
write access to shared storage for project directories
a common path to each project directory

It may simplify the configuration if the CryoSPARC master computer can directly submit LSF jobs, as mentioned earlier.

cgirda · September 13, 2023, 8:24pm

Are you using the terms CryoSPARC scheduler and _CryoSPARC master interchangeably?
Yes sorry for that “Cryosparch scheduler”

In the simplest case, the Linux account that runs CryoSPARC can submit an LSF job by running bsub on the computer that also runs cryosparc_master processes.

No that is not the case but I can create a consistent accounts arcoss the node “cryosparc”

At minimum, the CryoSPARC master and LSF compute nodes should share

a Linux account that runs CryoSPARC processes, with matching user ids.
write access to shared storage for project directories
a common path to each project directory

Ok will configure

It may simplify the configuration if the CryoSPARC master computer can directly submit LSF jobs, as mentioned earlier.

Will add

Let me try the above changes.

Thank you
Chakri

cgirda · September 13, 2023, 8:57pm

Instead of the default cryosparc user account. Is there any place in the configuration where we can say use a different account to submit a lsf job?

wtempel · September 14, 2023, 3:35pm

I am not sure right now that there is. If there were a way, however, would not this result in inconsistencies in file ownerships between files created inside the job directories on the master on one hand (job directory created), worker (processing output) on the other?