3D Flex Training job using GPU it shouldn't see?

rdrighetto · April 21, 2023, 8:08am

Hi,

I have a cryoSPARC v4.2.1 instance running on a node with 4x NVIDIA A40 GPUs (i.e. devices 0,1,2,3). I configured it to only see GPUs 1 and 2, and it works fine most of the time.

However, when doing a 3D Flex Training job, I notice the process runs on both GPUs 0 and 1, even though in theory it grabbed only GPU 1:

Info from the job starting:

License is valid.

Launching job on lane default target worker08.cluster.bc2.ch ...

Running job on master node hostname worker08.cluster.bc2.ch
[CPU:  167.0 MB  Avail:1988.64 GB]

Job J455 Started
[CPU:  167.1 MB  Avail:1988.64 GB]

Master running v4.2.1, worker running v4.2.1
[CPU:  167.1 MB  Avail:1988.64 GB]

Working in directory: /scicore/home/engel0006/GROUP/pool-engel/Cryosparc_projects/CS-mitoribo-cf/J455
[CPU:  167.1 MB  Avail:1988.64 GB]

Running on lane default
[CPU:  167.1 MB  Avail:1988.64 GB]

Resources allocated: 
[CPU:  167.1 MB  Avail:1988.64 GB]

  Worker:  worker08.cluster.bc2.ch
[CPU:  167.1 MB  Avail:1988.64 GB]

  CPU   :  [0, 1, 2, 3]
[CPU:  167.1 MB  Avail:1988.64 GB]

  GPU   :  [1]
[CPU:  167.1 MB  Avail:1988.64 GB]

  RAM   :  [0, 1, 2, 3, 4, 5, 12, 13]
[CPU:  167.1 MB  Avail:1988.64 GB]

  SSD   :  False
[CPU:  167.1 MB  Avail:1988.64 GB]

--------------------------------------------------------------
[CPU:  167.1 MB  Avail:1988.64 GB]

Importing job module for job type flex_train...
[CPU:  344.6 MB  Avail:1988.41 GB]

Job ready to run
[CPU:  344.6 MB  Avail:1988.41 GB]

***************************************************************
[CPU:  422.7 MB  Avail:1988.33 GB]

====== 3D Flex Training Model Setup =======
[CPU:  422.7 MB  Avail:1988.33 GB]

  Loading mesh...

The output from nvidia-smi:

Fri Apr 21 10:05:04 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A40          On   | 00000000:23:00.0 Off |                    0 |
|  0%   36C    P0    76W / 300W |    326MiB / 46068MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A40          On   | 00000000:41:00.0 Off |                    0 |
|  0%   52C    P0   111W / 300W |   3830MiB / 46068MiB |     18%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A40          On   | 00000000:A1:00.0 Off |                    0 |
|  0%   32C    P8    32W / 300W |     25MiB / 46068MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A40          On   | 00000000:C1:00.0 Off |                    0 |
|  0%   29C    P8    30W / 300W |     25MiB / 46068MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      5890      G   /usr/bin/X                         23MiB |
|    0   N/A  N/A     36522      C   python                            257MiB |
|    0   N/A  N/A     55304      G   ...nuxAMD64-Optimize/Amira3D       43MiB |
|    1   N/A  N/A      5890      G   /usr/bin/X                         22MiB |
|    1   N/A  N/A     36522      C   python                           3805MiB |
|    2   N/A  N/A      5890      G   /usr/bin/X                         22MiB |
|    3   N/A  N/A      5890      G   /usr/bin/X                         22MiB |
+-----------------------------------------------------------------------------+

So we have this python process with PID 36522 running on both GPU 0 and 1, which is cryoSPARC job J455 above:

diogori 36522 543 1.8 84883072 39639236 ? Rl Apr20 3739:36 python -c import cryosparc_compute.run as run; run.run() --project P3 --job J455 --master_hostname worker08.cluster.bc2.ch --master_command_core_port 39002

It’s true that only GPU 1 is doing the heavy lifting, so maybe it’s not too bad, but cryosparc should not even know about the existence of GPU 0. I reserve that GPU for other graphical stuff and don’t wnat cryosparc jobs running there.
How does it find this GPU? Could it be related to the external modules of the 3D Flex job?
And more importantly, can I prevent this somehow?

Thank you!

wtempel · April 21, 2023, 1:00pm

Please can you post the output of
cryosparcm cli "get_scheduler_targets()".

rdrighetto · April 21, 2023, 2:18pm

Please find the requested output below. Note that this is a login node attached to an HPC cluster. In our cryoSPARC instance we can both run jobs locally on the login node (with GPUs 1 and 2 out of 0,1,2,3) as well as submit jobs to the cluster GPU nodes via SLURM.
The problem I’m reporting is for when running jobs locally on the login node.

diogori@worker08:~$ cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/scratch/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 1, 'mem': 47698018304, 'name': 'NVIDIA A40'}, {'id': 2, 'mem': 47698018304, 'name': 'NVIDIA A40'}], 'hostname': 'worker08.cluster.bc2.ch', 'lane': 'default', 'monitor_port': None, 'name': 'worker08.cluster.bc2.ch', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127], 'GPU': [1, 2], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257]}, 'ssh_str': 'diogori@worker08.cluster.bc2.ch', 'title': 'Worker node worker08.cluster.bc2.ch', 'type': 'node', 'worker_bin_path': '/scicore/home/engel0006/GROUP/pool-engel/soft/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '$TMPDIR', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'custom_var_names': ['command', 'custom_ram_gb', 'qos', 'custom_num_cpu'], 'custom_vars': {'custom_num_cpu': '2', 'custom_ram_gb': '8', 'qos': '6hours'}, 'desc': None, 'hostname': 'scicore_cpu', 'lane': 'scicore_cpu', 'name': 'scicore_cpu', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/usr/bin/env bash\n#### cryoSPARC cluster submission script template for SLURM\n## Available variables:\n## {{ run_cmd }}            - the complete command string to run the job\n## {{ num_cpu }}            - the number of CPUs needed\n## {{ num_gpu }}            - the number of GPUs needed. \n##                            Note: the code will use this many GPUs starting from dev id 0\n##                                  the cluster scheduler or this script have the responsibility\n##                                  of setting CUDA_VISIBLE_DEVICES so that the job code ends up\n##                                  using the correct cluster-allocated GPUs.\n## {{ ram_gb }}             - the amount of RAM needed in GB\n## {{ job_dir_abs }}        - absolute path to the job directory\n## {{ project_dir_abs }}    - absolute path to the project dir\n## {{ job_log_path_abs }}   - absolute path to the log file for the job\n## {{ worker_bin_path }}    - absolute path to the cryosparc worker command\n## {{ run_args }}           - arguments to be passed to cryosparcw run\n## {{ project_uid }}        - uid of the project\n## {{ job_uid }}            - uid of the job\n## {{ job_creator }}        - name of the user that created the job (may contain spaces)\n## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email)\n## {{ job_type }}           - CryoSPARC job type\n##\n## What follows is a simple SLURM script:\n\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task={{ custom_num_cpu }}\n#SBATCH --partition=scicore\n#SBATCH --mem={{ custom_ram_gb }}G\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --qos={{ qos }}\n\n{{ run_cmd }}', 'send_cmd_tpl': '{{ command }}', 'title': 'scicore_cpu', 'tpl_vars': ['run_cmd', 'job_dir_abs', 'job_uid', 'num_cpu', 'job_type', 'cryosparc_username', 'command', 'worker_bin_path', 'run_args', 'cluster_job_id', 'job_log_path_abs', 'custom_ram_gb', 'project_uid', 'num_gpu', 'ram_gb', 'project_dir_abs', 'qos', 'custom_num_cpu', 'job_creator'], 'type': 'cluster', 'worker_bin_path': '/scicore/home/engel0006/GROUP/pool-engel/soft/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '$TMPDIR', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'custom_var_names': ['custom_num_gpu', 'partition', 'command', 'custom_ram_gb', 'qos', 'custom_num_cpu'], 'custom_vars': {'custom_num_cpu': '4', 'custom_num_gpu': '1', 'custom_ram_gb': '128', 'partition': 'rtx8000', 'qos': 'gpu1day'}, 'desc': None, 'hostname': 'scicore_gpu', 'lane': 'scicore_gpu', 'name': 'scicore_gpu', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/usr/bin/env bash\n#### cryoSPARC cluster submission script template for SLURM\n## Available variables:\n## {{ run_cmd }}            - the complete command string to run the job\n## {{ num_cpu }}            - the number of CPUs needed\n## {{ num_gpu }}            - the number of GPUs needed. \n##                            Note: the code will use this many GPUs starting from dev id 0\n##                                  the cluster scheduler or this script have the responsibility\n##                                  of setting CUDA_VISIBLE_DEVICES so that the job code ends up\n##                                  using the correct cluster-allocated GPUs.\n## {{ ram_gb }}             - the amount of RAM needed in GB\n## {{ job_dir_abs }}        - absolute path to the job directory\n## {{ project_dir_abs }}    - absolute path to the project dir\n## {{ job_log_path_abs }}   - absolute path to the log file for the job\n## {{ worker_bin_path }}    - absolute path to the cryosparc worker command\n## {{ run_args }}           - arguments to be passed to cryosparcw run\n## {{ project_uid }}        - uid of the project\n## {{ job_uid }}            - uid of the job\n## {{ job_creator }}        - name of the user that created the job (may contain spaces)\n## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email)\n## {{ job_type }}           - CryoSPARC job type\n##\n## What follows is a simple SLURM script:\n\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task={{ custom_num_cpu }}\n#SBATCH --gres=gpu:{{ custom_num_gpu }}\n#SBATCH --partition={{ partition }}\n#SBATCH --mem={{ custom_ram_gb }}G\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --qos={{ qos }}\n\n{{ run_cmd }}', 'send_cmd_tpl': '{{ command }}', 'title': 'scicore_gpu', 'tpl_vars': ['run_cmd', 'job_dir_abs', 'custom_num_gpu', 'job_uid', 'partition', 'job_type', 'num_cpu', 'cryosparc_username', 'command', 'worker_bin_path', 'run_args', 'cluster_job_id', 'job_log_path_abs', 'custom_ram_gb', 'project_uid', 'num_gpu', 'ram_gb', 'project_dir_abs', 'qos', 'custom_num_cpu', 'job_creator'], 'type': 'cluster', 'worker_bin_path': '/scicore/home/engel0006/GROUP/pool-engel/soft/cryosparc/cryosparc_worker/bin/cryosparcw'}]

Andrea · April 24, 2023, 7:26am

IIRC Torch autodetects all GPUs on the physical machine / node (disregarding SLURM allocation)
You can either configure your SLURM to restrict Gres allocation via cgroups (painful), or you can add the following line to the beginning of your cluster submission script

export CUDA_VISIBLE_DEVICES=$SLURM_JOB_GPUS

With this line only the gpus assigned by SLURM will be visible to CUDA / torch.
(see Slurm Workload Manager - Generic Resource (GRES) Scheduling for more details)

Then just reconnect your cluster with

cryosparcm cluster connect

(See guide at Downloading and Installing CryoSPARC | CryoSPARC Guide)

BTW, you might consider prepending the CUDA_VISIBLE_DEVICES line to all your job submission scripts (e.g. relion, crYOLO, etc.). There are no real downsides as far as I can tell, and it makes relion usage possible on shared nodes (as long as you tweak relion’s caching script to avoid filename collisions).

rdrighetto · April 24, 2023, 10:52am

Thanks a lot for the tip. This is what I suspected. However, please note that problem reported is when running the 3D Flex Training job on the local GPUs, not when submitting jobs to other nodes via SLURM (our cryoSPARC instance can do both).

How would I go about enforcing CUDA_VISIBLE_DEVICES when running jobs locally?