Hello, thanks for the replies.
@ Hossein
I think from SLURM’s perspective both commands will give your job num_cpu cores. The -n {{ num_cpu }} is a bit risky since tasks can run on multiple nodes and your solution limit them to one node.
Yes, it limits the job submission to one node, but the GRES:GPU in SLURM is also “per node” based, so if you ask for 4 GPUs in cryoSPARC (even without specifying the number of nodes in the script), the scheduler will search for a node with 4 GPUs.
Here is my actual configuration :
cluster_info.json :
{
"qdel_cmd_tpl": "scancel {{ cluster_job_id }}",
"worker_bin_path": "/home/cryosparc_user/cryosparc2_worker/bin/cryosparcw",
"title": "debug_cluster",
"cache_path": "/ssd/tmp",
"qinfo_cmd_tpl": "sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'",
"qsub_cmd_tpl": "sbatch {{ script_path_abs }}",
"qstat_cmd_tpl": "squeue -j {{ cluster_job_id }}",
"cache_quota_mb": null,
"send_cmd_tpl": "{{ command }}",
"cache_reserve_mb": 10000,
"name": "debug_cluster"
}
cluster_script.sh :
#!/bin/bash
#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}
#SBATCH --partition=debug
#SBATCH --output={{ job_log_path_abs }}
#SBATCH --error={{ job_log_path_abs }}
#SBATCH --nodes=1
#SBATCH --mem={{ (ram_gb*1000)|int }}M
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task={{ num_cpu }}
#SBATCH --gres=gpu:{{ num_gpu }}
#SBATCH --gres-flags=enforce-binding
srun {{ run_cmd }}
The CUDA_VISIBLE_DEVICES is set by the srun command so you don’t need to process it like in the example script.
@ Ali :
The job is a 2D classification with 2 GPUs
Launching job on lane debug_cluster target debug_cluster ...
License is valid.
Launching job on cluster debug_cluster
====================== Cluster submission script: ========================
==========================================================================
#!/bin/bash
#SBATCH --job-name=cryosparc_P1_J16
#SBATCH --partition=debug
#SBATCH --output=/home/rnavaza/csparc_PTO/P1/J16/job.log
#SBATCH --error=/home/rnavaza/csparc_PTO/P1/J16/job.log
#SBATCH --nodes=1
#SBATCH --mem=24000M
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --gres=gpu:2
#SBATCH --gres-flags=enforce-binding
srun /home/cryosparc_user/cryosparc2_worker/bin/cryosparcw run --project P1 --job J16 --master_hostname master.example.org --master_command_core_port 39002 > /home/rnavaza/csparc_PTO/P1/J16/job.log 2>&1
==========================================================================
==========================================================================
-------- Submission command:
sbatch /home/rnavaza/csparc_PTO/P1/J16/queue_sub_script.sh
-------- Cluster Job ID:
203
-------- Queued at 2018-10-05 21:20:35.942541
-------- Job status at 2018-10-05 21:20:35.961971
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
203 debug cryospar cryospar PD 0:00 1 (None)
Project P1 Job J16 Started
Master running v2.3.0, worker running v2.3.0
Running on lane debug_cluster
Resources allocated:
Worker: debug_cluster
CPU : [0, 1]
GPU : [0, 1]
RAM : [0, 1, 2]
SSD : True
--------------------------------------------------------------
Importing job module for job type class_2D...
Job ready to run
***************************************************************
Using random seed of 1555189427
Loading a ParticleStack with 89761 items...
SSD cache : cache successfuly synced in_use
SSD cache : cache successfuly synced, found 22440.37MB of files on SSD.
SSD cache : cache successfuly requested to check 127 files.
SSD cache : cache requires 0.00MB more on the SSD for files to be downloaded.
SSD cache : cache has enough available space.
SSD cache : cache starting transfers to SSD.
SSD cache : complete, all requested files are available on SSD.
Done.
Windowing particles
Done.
Using 300 classes.
Computing 2D class averages:
Volume Size: 128 (voxel size 2.42A)
Zeropadded Volume Size: 256
Data Size: 256 (pixel size 1.21A)
Using Resolution: 6.00A (51.0 radius)
Windowing only corners of 2D classes at each iteration.
Using random seed for initialization of 1735495459
Done in 1.148s.
Start of Iteration 0
I’m not sure how to resolve the GPU binding problem. I can try to generate an “heterogeneous job” in SLURM for working around that. Can you confirm that cryoSPARC needs one MPI process per GPU and “num_cpu / num_gpu” threads per MPI process ? Or does it need “num_cpu” MPI processes ?