available_devs=""
for devidx in $(seq 0 15);
do
if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
if [[ -z “$available_devs” ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
export CUDA_VISIBLE_DEVICES=$available_devs
{{ run_cmd }}
Based on this, I still don’t see why two jobs would land on the same GPU. What else can I check?
Are the node GPUs managed under cgroups and has your cluster admin configured it to ConstrainDevices?
The script supplied in the default template is dependent, in part, on what GPUs have been made visible in the cluster environment. The fact that multiple processes from disparate SLURM jobs are landing on the same GPU suggests that all jobs scheduled on the node have access to all the GPUs regardless of the --gres=gpu parameter. This is suboptimal in my opinion.
The script is basically querying, via nvidia-smi, if any processes are running on a given GPU. The GPUs that appear idle at the time of query are then collated and made available via CUDA_VISIBLE_DEVICES for assignment. However, this does not cover the edge case in which job A may, e.g. still be caching to SSD (yet to engage its GPU), when job B initiates its query, sees the GPU in question idling and grabs it.
It’s probably more efficient to have cgroup.conf set to allow jobs access only to the GPUs that have been scheduled.
This is a significant limitation of combining nvidia-smi --query-compute-apps and CUDA_VISIBLE_DEVICES inside the cluster job. I agree that control of GPU access via cluster resource management is preferable.