This has been know, reported and complained about quite a bit since CryoSPARC 4.4 was made public.
As a sanity check, please try downsampling you particles to 500-pixel boxes with Fourier cropping. That shouldn’t run out of memory.
However, there are some jobs which 256GB of system RAM will never be sufficient for. NU refine with large boxes (>600 pixels) often consume a lot of system memory. I’ve seen ~440GB used in some cases… and Low Memory Mode is only of limited help with system memory utilisation.
Thank you for your input. We also tested 512x512 px particles. The success rate seemed to be higher, but the same cuMemHostAlloc error had occurred. Looks like the listed RAM requirement is way off:
Non-uniform Refinement
Requires 4 CPU cores, 24 GB RAM, 1 GPU
Even assume this number is for each core, 96 GB would not be sufficient for large particles.
I did some testing previously. Might be a useful guide:
edit: Grabbing some info off a run I currently have going, NU refine with an 800 pixel box in Low Memory Mode will eat 182GB of RAM when at 1.7 Ang. So with 256GB RAM, you’ll only be able to run one NU refine at once.
Thanks again for sharing the information. Based on the numbers, it appears we’re still encountering stability or configuration issues—our A40 GPUs (48 GB) and system RAM should be sufficient. We’re seeing similar failures also in Homo-refine and Local-refine.
You are likely referring to a job type-specific, fixed estimate that does not take into consideration the requirements of you specific data set. If you already use {{ ram_gb }} to request the allocation of RAM to a cluster job, you may wish to modify the cluster script template to allow customized augmentation of the memory allocation, using custom variables (example 1, example 2).
We would be interested in examples where jobs with identical parameters, inputs and cluster resource allocations completed successfully with an older CryoSPARC version and failed with a newer.
Which were the affected job types?
Would you be willing to share with the CryoSPARC developers job reports for pairs of successful and failed jobs with identical parameters, inputs and cluster resource allocations.
Some updates: The cuMemHostAlloc errors were resolved after we disabled THP. Enabling low-memory mode helps mitigate the OOM error, though this ultimately depends on available system memory. In our case, 256 GB is just sufficient for 800 x 800 px boxes when running in low-memory mode.
For jobs with identical parameters, once THP was disabled, the run completed successfully.
Thank you all for following up—happy to provide any additional details if helpful.