GPU memory usage and MIG

mcahn · April 25, 2025, 6:53pm

Our cluster includes nodes that each have four 80 GB A100 GPUs, as well as older nodes that each have four 32 GB V100 GPUs. We are considering splitting up some of the 80 GB A100s using MIG (Multi-Instance GPU). However, looking at all the jobs run on these nodes this year, about 95% of the jobs used more than 40 GB of GPU memory. (This includes CryoSPARC, RELION, AlphaFold, and some other jobs).

This is surprising, because when we only had 32 GB V100 nodes, no jobs were seen to run out of GPU memory.

Is CryoSPARC able to detect how much GPU memory is available, and configure each job accordingly? For example, will it process fewer images at once if only 40 GB is available, and more images at once if 80 GB is available? That would explain why things worked on the V100s.

Thanks,
Matthew

wtempel · May 5, 2025, 8:23pm

Hi Matthew,

CryoSPARC does not include a mechanism that would increase VRAM usage based the amount of available VRAM. A CryoSPARC job whose VRAM requirement exceeds available VRAM is expected to fail.