Passing -nodelist to slurm

kevinj · May 25, 2023, 11:21pm

Hi all,
We have a cluster of 4 nodes and submit jobs using slurm. Occasionally, two jobs that require >50% of scratch space get sent to the same node, and the second job has to wait for hours until the scratch is available. My two workarounds are to keep resubmitting the waiting job until it gets sent to a different node, or to set up additional lanes corresponding to each node “for emergency use only”, but there must be a better way. Is it possible to make the -nodelist option available to be sent to slurm in these cases, or is there some other solution?

Thanks for advice
Kevin

jucastil · May 26, 2023, 8:33am

I would suggest to modify the SLURM configuration.
For example by adding a new partition,

PartitionName=cryosparc Nodes= csone, cstwo  State=UP  MinNodes=1 MaxNodes=UNLIMITED

Then in your gres file, if you want to reserve “only” the second GPU…

gres.conf 
Name=gpu Type=gtx1080  File=/dev/nvidia1

Or something like this. Of course a “native” cryosparc solution would be great also…

leetleyang · May 26, 2023, 5:16pm

Hi,

You can make use of jinja statements and/or custom variables when formatting cluster_script.sh. Some examples below.

Example 1:

...
{%- if node %}
#SBATCH --nodelist={{ node }}
{%- endif %}
...

Example 2:

...
#SBATCH --partition=gpu {{ extra_param }}
...

In the above example, the var extra_param, added to the end of any arbitrary line in the header, can be defined as “-w node[2-4]” or “-x node1” at time of submission to either include or exclude certain nodes, respectively.

It can also be set to a string of slurm flags, e.g. “-w node2 --mem=60G --constraint=intel”.

Cheers,
Yang