Hi all,
I am wondering if anyone has experience running cryosparc on virtual GPUs as available through MIG.
So far I was not able to communicate between cryosparc and the instances.
Best,
Tarek
Hi all,
I am wondering if anyone has experience running cryosparc on virtual GPUs as available through MIG.
So far I was not able to communicate between cryosparc and the instances.
Best,
Tarek
Hi @tarek
Have you assigned unique ID for each MIG device with UUID?
and then add the environment variables?
Cheers,
qitsweauca
Yes.
user@gpu:~$ nvidia-smi -L
GPU 0: A100-PCIE-40GB (UUID: GPU-90134985-26d8-39db-81bd-61f9a31864fe)
MIG 2g.10gb Device 0: (UUID: MIG-GPU-90134985-26d8-39db-81bd-61f9a31864fe/3/0)
MIG 2g.10gb Device 1: (UUID: MIG-GPU-90134985-26d8-39db-81bd-61f9a31864fe/4/0)
MIG 2g.10gb Device 2: (UUID: MIG-GPU-90134985-26d8-39db-81bd-61f9a31864fe/5/0)
GPU 1: A100-PCIE-40GB (UUID: GPU-0d407aac-ca34-f203-ab5b-b57b784d5074)
and here
user@gpu:~$ CUDA_VISIBLE_DEVICES=MIG-GPU-90134985-26d8-39db-81bd-61f9a31864fe/3/0,MIG-GPU-90134985-26d8-39db-81bd-61f9a31864fe/4/0,MIG-GPU-90134985-26d8-39db-81bd-61f9a31864fe/5/0 /srv/public/cryosparc/cryosparc_worker/bin/cryosparcw gpulist
Detected 1 CUDA devices.
id pci-bus name
---------------------------------------------------------------
0 0000:01:00.0 A100-PCIE-40GB MIG 2g.10gb
---------------------------------------------------------------
am I missing something?
I just found out that this is a current limitation of CUDA11, only a single instance can be assigned to CUDA. Seems like we have to wait for future CUDA releases…
### [CUDA Device Enumeration](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#cuda-visible-devices)
MIG supports running CUDA applications by specifying the CUDA device on which the application should be run. With CUDA 11, only enumeration of a single MIG instance is supported.
CUDA applications treat a CI and its parent GI as a single CUDA device. CUDA is limited to use a single CI and will pick the first one available if several of them are visible. To summarize, there are two constraints:
1. CUDA can only enumerate a single compute instance
2. CUDA will not enumerate non-MIG GPU if any compute instance is enumerated on any other GPU
Note that these constraints may be relaxed in future NVIDIA driver releases for MIG.