SLURM Job Submission: Configuring node count

hansenbry · August 24, 2021, 4:02pm

We’re using SLURM as our job manager on our HPC and we’re seeing an odd behavior in the system and don’t know where to look to optimize it and we were hoping someone here might be able to help. A motioncorr job was submitted with the following SLURM settings

License is valid.
Launching job on lane bs2 target bs2 ...
Launching job on cluster bs2

====================== Cluster submission script: ========================
==========================================================================
#!/usr/bin/env bash
#### cryoSPARC cluster submission script template for SLURM
## Available variables:
## /gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw run --project P2 --job J277 --master_hostname ****** --master_command_core_port 39002 > /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J277/job.log 2>&1             - the complete command string to run the job
## 6            - the number of CPUs needed
## 1            - the number of GPUs needed. 
##                            Note: the code will use this many GPUs starting from dev id 0
##                                  the cluster scheduler or this script have the responsibility
##                                  of setting CUDA_VISIBLE_DEVICES so that the job code ends up
##                                  using the correct cluster-allocated GPUs.
## 16.0             - the amount of RAM needed in GB
## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J277        - absolute path to the job directory
## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2    - absolute path to the project dir
## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J277/job.log   - absolute path to the log file for the job
## /gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw    - absolute path to the cryosparc worker command
## --project P2 --job J277 --master_hostname ****** --master_command_core_port 39002           - arguments to be passed to cryosparcw run
## P2        - uid of the project
## J277            - uid of the job
## Bryan Hansen        - name of the user that created the job (may contain spaces)
## hansenbry@niaid.nih.gov - cryosparc username of the user that created the job (usually an email)
##
#### What follows is a simple SLURM script:

#SBATCH --job-name cryosparc_P2_J277
#SBATCH --gres=gpu:1
#SBATCH -p gpu
#SBATCH --cpus-per-task=6
#SBATCH --mem=48384MB

available_devs=""
for devidx in $(seq 0 15);
do
    if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
        if [[ -z "$available_devs" ]] ; then
            available_devs=$devidx
        else
            available_devs=$available_devs,$devidx
        fi
    fi
done
export CUDA_VISIBLE_DEVICES=$available_devs

/gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw run --project P2 --job J277 --master_hostname ***** --master_command_core_port 39002 > /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J277/job.log 2>&1 

==========================================================================
==========================================================================

-------- Submission command: 
sbatch /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J277/queue_sub_script.sh

-------- Cluster Job ID: 
264933

-------- Queued on cluster at 2021-08-24 09:06:08.419585

-------- Job status at 2021-08-24 09:06:08.450041
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
            264933       gpu cryospar    cryo1 PD       0:00      6 (None) 

which resulted in 6 nodes being taken despite only 1 GPU being requested. There are also 16 cores per nodes on the GPU systems so the 6 cores per task shouldn't have resulted in 6 nodes. the reason we don't know where to start is a CTF job with the following SLURM settings

License is valid.

Launching job on lane bs2 target bs2 ...

Launching job on cluster bs2


====================== Cluster submission script: ========================
==========================================================================
#!/usr/bin/env bash
#### cryoSPARC cluster submission script template for SLURM
## Available variables:
## /gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw run --project P2 --job J276 --master_hostname ***** --master_command_core_port 39002 > /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J276/job.log 2>&1             - the complete command string to run the job
## 2            - the number of CPUs needed
## 1            - the number of GPUs needed. 
##                            Note: the code will use this many GPUs starting from dev id 0
##                                  the cluster scheduler or this script have the responsibility
##                                  of setting CUDA_VISIBLE_DEVICES so that the job code ends up
##                                  using the correct cluster-allocated GPUs.
## 8.0             - the amount of RAM needed in GB
## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J276        - absolute path to the job directory
## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2    - absolute path to the project dir
## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J276/job.log   - absolute path to the log file for the job
## /gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw    - absolute path to the cryosparc worker command
## --project P2 --job J276 --master_hostname ***** --master_command_core_port 39002           - arguments to be passed to cryosparcw run
## P2        - uid of the project
## J276            - uid of the job
## Bryan Hansen        - name of the user that created the job (may contain spaces)
## hansenbry@niaid.nih.gov - cryosparc username of the user that created the job (usually an email)
##
#### What follows is a simple SLURM script:

#SBATCH --job-name cryosparc_P2_J276
#SBATCH --gres=gpu:1
#SBATCH -p gpu
#SBATCH --cpus-per-task=2
#SBATCH --mem=24192MB

available_devs=""
for devidx in $(seq 0 15);
do
    if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
        if [[ -z "$available_devs" ]] ; then
            available_devs=$devidx
        else
            available_devs=$available_devs,$devidx
        fi
    fi
done
export CUDA_VISIBLE_DEVICES=$available_devs

/gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw run --project P2 --job J276 --master_hostname ***** --master_command_core_port 39002 > /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J276/job.log 2>&1 

==========================================================================
==========================================================================

-------- Submission command: 
sbatch /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J276/queue_sub_script.sh

-------- Cluster Job ID: 
264932

only pulled 1 node as expected and not 2 as would have been the pattern from the motioncorr job. We’re currently running v3.2.0. Also I was asked to remove the hostname from the report so that’s why it’s not there

mcahn · August 24, 2021, 6:08pm

I could be wrong, but there isn’t anything in the SLURM scripts you posted that says how many nodes should be used. So SLURM is free to decide where to put the six CPUs. Perhaps there weren’t six free cores on one node at job submission time, so it used two nodes in order to get six cores. If you put this in the SLURM script:

#SBATCH -N 1

That ought to tell SLURM you really insist on having just one node.

Incidentally, I don’t bother to set CUDA_VISIBLE_DEVICES. SLURM will make it look like there are only as many devices as you requested, and then CryoSPARC will use those.

Best,
Matthew Cahn
Dept. of Molecular Biology
Princeton University

vatese · August 25, 2021, 10:56am

We set all of these:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node={{ num_cpu }}
#SBATCH --cpus-per-task=1
#SBATCH --threads-per-core=1

We ended up placing the default CUDA_VISIBLE_DEVICES setup around this if:

if [[ -z $CUDA_VISIBLE_DEVICES ]]; then
available_devs=""
for devidx in $(seq 0 15);
…
fi

The biggest issue we have encountered is that some jobs get OOM killed. Since the end user has no control over the memory, we had to create a lane that doubles the memory that Cryosparc allocates.

hansenbry · September 8, 2021, 2:14pm

Hi - Thanks for the tips so far. The IT team at our institute was wondering if any of the devs might be available for a short phone call to make sure they understand where some of the parameters get set that aren’t in the UI. @team Sorry if that’s the wrong @, but wasn’t sure who to tag. Thanks again.