cuMemHostAlloc & Out-of-Memory Errors

mingleizhao · August 4, 2025, 8:10pm

Environment

CryoSPARC v4.7.1 on GPU cluster (SLURM)
Nodes: 4× NVIDIA A40 GPUs, 256 GB RAM each
TPH enabled (will test disabling)

Dataset

500 K particles
800×800 px

Errors Encountered

cuMemHostAlloc failures during NU-Refine, Homo-Refine, Local-Refine, and 3D Classification
Setting CRYOSPARC_NO_PAGELOCK=true changed the error to:

DIE: allocate: out of memory (reservation insufficient)

What We’ve Tried

Toggling CRYOSPARC_NO_PAGELOCK
Reserving entire node for single-GPU jobs (occasional success but not always)
Change GPU batch size of images to 100 and 50 (occasional success)

Observations & Questions

Identical jobs completed successfully on earlier CryoSPARC versions
CryoSPARC’s reported memory requirements for NU-Refine and Homo-Refine often underestimate actual needs
Must often reserve all 256 GB to avoid OOM

Any insights or suggestions would be greatly appreciated!

rbs_sci · August 5, 2025, 2:18am

This has been know, reported and complained about quite a bit since CryoSPARC 4.4 was made public.

As a sanity check, please try downsampling you particles to 500-pixel boxes with Fourier cropping. That shouldn’t run out of memory.

However, there are some jobs which 256GB of system RAM will never be sufficient for. NU refine with large boxes (>600 pixels) often consume a lot of system memory. I’ve seen ~440GB used in some cases… and Low Memory Mode is only of limited help with system memory utilisation.

mingleizhao · August 5, 2025, 3:24am

Thank you for your input. We also tested 512x512 px particles. The success rate seemed to be higher, but the same cuMemHostAlloc error had occurred. Looks like the listed RAM requirement is way off:

Non-uniform Refinement

Requires 4 CPU cores, 24 GB RAM, 1 GPU

Even assume this number is for each core, 96 GB would not be sufficient for large particles.

rbs_sci · August 5, 2025, 4:21am

I did some testing previously. Might be a useful guide:

edit: Grabbing some info off a run I currently have going, NU refine with an 800 pixel box in Low Memory Mode will eat 182GB of RAM when at 1.7 Ang. So with 256GB RAM, you’ll only be able to run one NU refine at once.

mingleizhao · August 5, 2025, 3:42pm

Thanks again for sharing the information. Based on the numbers, it appears we’re still encountering stability or configuration issues—our A40 GPUs (48 GB) and system RAM should be sufficient. We’re seeing similar failures also in Homo-refine and Local-refine.

wtempel · August 12, 2025, 8:04pm

You are likely referring to a job type-specific, fixed estimate that does not take into consideration the requirements of you specific data set. If you already use
{{ ram_gb }} to request the allocation of RAM to a cluster job, you may wish to modify the cluster script template to allow customized augmentation of the memory allocation, using custom variables (example 1, example 2).

We would be interested in examples where jobs with identical parameters, inputs and cluster resource allocations completed successfully with an older CryoSPARC version and failed with a newer.
Which were the affected job types?
Would you be willing to share with the CryoSPARC developers job reports for pairs of successful and failed jobs with identical parameters, inputs and cluster resource allocations.

mingleizhao · August 26, 2025, 4:41pm

Some updates: The cuMemHostAlloc errors were resolved after we disabled THP. Enabling low-memory mode helps mitigate the OOM error, though this ultimately depends on available system memory. In our case, 256 GB is just sufficient for 800 x 800 px boxes when running in low-memory mode.

For jobs with identical parameters, once THP was disabled, the run completed successfully.

Thank you all for following up—happy to provide any additional details if helpful.