SLURM memory allocation

Hi all,

we have some trouble with the memory allocation on our SLURM cluster.

We have slurm nodes with up to 8 GPUs, and already figured out, that for example in the proteasome tutorial, the default memory is a bottleneck (but I also stressed the number of classes in 2D classification, to actually hit the limit).

Because of this initial experience, we added a very generous multiplier of x4 to the memory allocation:

#SBATCH --mem={{ (4 * ram_gb) | int }}G

By this, the majority of jobs did run successfully. But I have a major issue with this allocation, which is overbooking of ressources. At the moment, the cluster is still in testing phase, to the workload is managable. But as soon, as all our users will have access, we would like to distribute the ressources optimally.

Today, one of my testers had again a memory related error:
2026-02-12 11:58:40,194 core heartbeat INFO | ========= Updating heartbeat
DIE: allocate: out of memory (reservation insufficient)

This was a NU-refinement with 100k particles and a boxsize of 768px. I know this is huge, but we we will acquire data, require this huge box sizes (even bigger), especially if one want to push the details using EER upsampling.

Checking the job log, NU-Refinement by default wants only 24G of memory.

template args: {

"job_type": "nonuniform_refine_new",
"num_gpu": 1,
"num_cpu": 4,
"ram_gb": 24

}

And the real execution script, due to the 4x multiplier, took 96G:
#SBATCH --mem=96G

Of course, I can now further increase the multiplier, but this again might overbook the ressources.
For example, a Patch Motion using 8 GPUs already took (after multiplier) 480G of memory. Increasing this even further might hit a level, where I actually exceed the physical limit of the system.

So my question is now, are there plans to optimize the default allocation memory? I think, a “general value” for each job type might not work, as the actual requirements are input dependent.

For this special case, I know define this logic:
{% set mem_mult = 8 if job_type == “nonuniform_refine_new” else 4 %}
#SBATCH --mem={{ (mem_mult * ram_gb) | int }}G
to allocate 8x more memory to NU_refinement jobs, but keep the 4x to all other job types.

But as I said before, this is not how one should use SLURM ressources.

I am grateful for any input.

Best
Christian

edit:
I checked now again the system, to really verify that its not GPU Memory, but actually system memory allocation.

free -h shows that it goes up to 65G (so even a 2x multiplier would have failed).
nvidia-smi shows up to 20G loaded in the GPU, but as this is a H200 with 141G, this is not an issue at all.

edit2:
I tried again, and can reproduce ~65G memory usage and 20G VRAM usage.
The error is still DIE: allocate: out of memory (reservation insufficient), but I never see this in free -h - which I expected as the job dies as allocation, not usage.
Additionally, I can see in slurm, that the process switches from R to CG, and the cryoSPARC gui needs 2-3 minutes till it realizes, that the job is actually dead. But I think this is also due to heartbeat duration.

edit3:
I now tried a 20x multiplier, so having 480G memory for this job.
I can see, that the GPU spiked to 115G, and than the job dies as before.
At this point, I am actually not sure, if this is actually a SLURM error, or if there is an issue with nu_refinement. I will try again with an even bigger multiplier.

edit4:
Last test with 60x Multiplier.
#SBATCH --mem=1440G
and again
DIE: allocate: out of memory (reservation insufficient)

At this point, I am clueless.

Hi @ctueting,

Thank you for your question, and for providing all of that information about your tests with NU refinement on your SLURM cluster with different memory allocations!

Upon investigation, it appears that the error that you are seeing DIE: allocate: out of memory (reservation insufficient) is likely because the NU Refine job in CryoSPARC effectively limits the box size (around 600 pixels) that can be used in NU Refinement on the new (faster) version of the code.

To overcome the error that you are seeing, and allow jobs to run with large box sizes, you can try enabling low-memory mode in the NU Refine job, to use the older version of the code that does not have this limitation.

Hi @hbridges1

thanks for the clarification. I will try the low memory mode just out of curiosity.

But than it would be good, to implement at least a warning in the job log, if NU refine with boxes bigger than 600px are used. So the user is aware of this. Because I think, having that big boxes is not unusual, both because of advancement in camera development, but also due to accessibility. New users unfortunatelly does not know all the tricks (i.e., binning; accidental EER upsamling as the default us factor 2), but simply click and try.

Best

Christian

Hi, @ctueting; this is a little out of date, but still gives a reasonable idea:

Various threads throughout the forum have requested that the guide be updated with more realistic various maxima for different jobs.

But it’s a larger scale issue which is difficult to address generally, as (for example) 1050 pixel box refinements will run in NU refine on 48GB GPUs, but will demand >440GB of system RAM during various steps (usage is more reasonable throughout most of the run, but some calculations really spike RAM usage). I did post a screenshot immediately below the above post demonstrating this.

On a joint cluster, this is a recipe for disaster as I doubt most places will spec systems with 4TB+ of RAM “just in case”.

Oh wow, thank you for this piece of information. That explains alot.

Is this only for NU refinement, or are there other jobs, that spike really hard? Because I was thinking to add a small logic into my slurm script, to have for those jobs a very generous RAM multiplier, just in case (like this {% set mem_mult = 8 if job_type == “nonuniform_refine_new” else 4 %}).

Thank you @ctueting and @rbs_sci for your feedback and suggestions, these have been noted.