Discrete GPU usage

yodamoppet · August 29, 2022, 2:51pm

Greetings,

We have a cluster running cryosparc via SLURM. Nodes each have two GPUs.

Occasionally we have an error like:

pycuda._driver.MemoryError: cuMemAlloc failed: out of memory

Upon inspection, it appears that jobs by different users are using the same GPU.

How can we confirm this, and if so, how can we configure cryosparc such that each job runs on its own gpu(s).

Thanks in advance!

wtempel · September 8, 2022, 7:27pm

CUDA_VISIBLE_DEVICES?
See SLURM Job Submission: Configuring node count
for an example were idle GPUs are assigned to CUDA_VISIBLE_DEVICES

SLURM Job Submission: Configuring node count

available_devs=""
for devidx in $(seq 0 15);
do
    if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
        if [[ -z "$available_devs" ]] ; then
            available_devs=$devidx
        else
            available_devs=$available_devs,$devidx
        fi
    fi
done
export CUDA_VISIBLE_DEVICES=$available_devs

and another user’s opinion on the subject:

yodamoppet · September 13, 2022, 4:51pm

Hmmm, I have this in our cluster_script.sh, but I’m still seeing two jobs land on the same GPU, confirmed with nvidia-smi during job runs.

Here is cluster_script.sh:

#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}
#SBATCH -n {{ num_cpu }}
#SBATCH --gres=gpu:{{ num_gpu }}
#SBATCH -p defq
#SBATCH --mem={{ (ram_gb*1000)|int }}MB
#SBATCH -o {{ job_dir_abs }}/out.txt
#SBATCH -e {{ job_dir_abs }}/err.txt

available_devs=""
for devidx in $(seq 0 15);
do
if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
if [[ -z “$available_devs” ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
export CUDA_VISIBLE_DEVICES=$available_devs

Based on this, I still don’t see why two jobs would land on the same GPU. What else can I check?

leetleyang · September 13, 2022, 6:52pm

Are the node GPUs managed under cgroups and has your cluster admin configured it to ConstrainDevices?

The script supplied in the default template is dependent, in part, on what GPUs have been made visible in the cluster environment. The fact that multiple processes from disparate SLURM jobs are landing on the same GPU suggests that all jobs scheduled on the node have access to all the GPUs regardless of the --gres=gpu parameter. This is suboptimal in my opinion.

The script is basically querying, via nvidia-smi, if any processes are running on a given GPU. The GPUs that appear idle at the time of query are then collated and made available via CUDA_VISIBLE_DEVICES for assignment. However, this does not cover the edge case in which job A may, e.g. still be caching to SSD (yet to engage its GPU), when job B initiates its query, sees the GPU in question idling and grabs it.

It’s probably more efficient to have cgroup.conf set to allow jobs access only to the GPUs that have been scheduled.

wtempel · September 14, 2022, 5:46pm

This is a significant limitation of combining nvidia-smi --query-compute-apps and CUDA_VISIBLE_DEVICES inside the cluster job. I agree that control of GPU access via cluster resource management is preferable.

yodamoppet · September 16, 2022, 4:18pm

Very good, I will control this outside of Cryosparc via SLURM/cgroup constraints or masking scripts.

Thank you for the helpful advice.