cuMemHostAlloc & Out-of-Memory Errors

Environment

  • CryoSPARC v4.7.1 on GPU cluster (SLURM)
  • Nodes: 4× NVIDIA A40 GPUs, 256 GB RAM each
  • TPH enabled (will test disabling)

Dataset

  • 500 K particles
  • 800×800 px

Errors Encountered

  1. cuMemHostAlloc failures during NU-Refine, Homo-Refine, Local-Refine, and 3D Classification
  2. Setting CRYOSPARC_NO_PAGELOCK=true changed the error to:

DIE: allocate: out of memory (reservation insufficient)

What We’ve Tried

  • Toggling CRYOSPARC_NO_PAGELOCK
  • Reserving entire node for single-GPU jobs (occasional success but not always)
  • Change GPU batch size of images to 100 and 50 (occasional success)

Observations & Questions

  • Identical jobs completed successfully on earlier CryoSPARC versions
  • CryoSPARC’s reported memory requirements for NU-Refine and Homo-Refine often underestimate actual needs
  • Must often reserve all 256 GB to avoid OOM

Any insights or suggestions would be greatly appreciated!

This has been know, reported and complained about quite a bit since CryoSPARC 4.4 was made public. :wink:

As a sanity check, please try downsampling you particles to 500-pixel boxes with Fourier cropping. That shouldn’t run out of memory.

However, there are some jobs which 256GB of system RAM will never be sufficient for. :frowning: NU refine with large boxes (>600 pixels) often consume a lot of system memory. I’ve seen ~440GB used in some cases… and Low Memory Mode is only of limited help with system memory utilisation.

1 Like

Thank you for your input. We also tested 512x512 px particles. The success rate seemed to be higher, but the same cuMemHostAlloc error had occurred. Looks like the listed RAM requirement is way off:

Non-uniform Refinement

Requires 4 CPU cores, 24 GB RAM, 1 GPU

Even assume this number is for each core, 96 GB would not be sufficient for large particles.

I did some testing previously. Might be a useful guide:

edit: Grabbing some info off a run I currently have going, NU refine with an 800 pixel box in Low Memory Mode will eat 182GB of RAM when at 1.7 Ang. So with 256GB RAM, you’ll only be able to run one NU refine at once.

1 Like

Thanks again for sharing the information. Based on the numbers, it appears we’re still encountering stability or configuration issues—our A40 GPUs (48 GB) and system RAM should be sufficient. We’re seeing similar failures also in Homo-refine and Local-refine.