Hi all,
we have some trouble with the memory allocation on our SLURM cluster.
We have slurm nodes with up to 8 GPUs, and already figured out, that for example in the proteasome tutorial, the default memory is a bottleneck (but I also stressed the number of classes in 2D classification, to actually hit the limit).
Because of this initial experience, we added a very generous multiplier of x4 to the memory allocation:
#SBATCH --mem={{ (4 * ram_gb) | int }}G
By this, the majority of jobs did run successfully. But I have a major issue with this allocation, which is overbooking of ressources. At the moment, the cluster is still in testing phase, to the workload is managable. But as soon, as all our users will have access, we would like to distribute the ressources optimally.
Today, one of my testers had again a memory related error:
2026-02-12 11:58:40,194 core heartbeat INFO | ========= Updating heartbeat
DIE: allocate: out of memory (reservation insufficient)
This was a NU-refinement with 100k particles and a boxsize of 768px. I know this is huge, but we we will acquire data, require this huge box sizes (even bigger), especially if one want to push the details using EER upsampling.
Checking the job log, NU-Refinement by default wants only 24G of memory.
template args: {
…
"job_type": "nonuniform_refine_new",
"num_gpu": 1,
"num_cpu": 4,
"ram_gb": 24
…
}
And the real execution script, due to the 4x multiplier, took 96G:
#SBATCH --mem=96G
Of course, I can now further increase the multiplier, but this again might overbook the ressources.
For example, a Patch Motion using 8 GPUs already took (after multiplier) 480G of memory. Increasing this even further might hit a level, where I actually exceed the physical limit of the system.
So my question is now, are there plans to optimize the default allocation memory? I think, a “general value” for each job type might not work, as the actual requirements are input dependent.
For this special case, I know define this logic:
{% set mem_mult = 8 if job_type == “nonuniform_refine_new” else 4 %}
#SBATCH --mem={{ (mem_mult * ram_gb) | int }}G
to allocate 8x more memory to NU_refinement jobs, but keep the 4x to all other job types.
But as I said before, this is not how one should use SLURM ressources.
I am grateful for any input.
Best
Christian
edit:
I checked now again the system, to really verify that its not GPU Memory, but actually system memory allocation.
free -h shows that it goes up to 65G (so even a 2x multiplier would have failed).
nvidia-smi shows up to 20G loaded in the GPU, but as this is a H200 with 141G, this is not an issue at all.
edit2:
I tried again, and can reproduce ~65G memory usage and 20G VRAM usage.
The error is still DIE: allocate: out of memory (reservation insufficient), but I never see this in free -h - which I expected as the job dies as allocation, not usage.
Additionally, I can see in slurm, that the process switches from R to CG, and the cryoSPARC gui needs 2-3 minutes till it realizes, that the job is actually dead. But I think this is also due to heartbeat duration.
edit3:
I now tried a 20x multiplier, so having 480G memory for this job.
I can see, that the GPU spiked to 115G, and than the job dies as before.
At this point, I am actually not sure, if this is actually a SLURM error, or if there is an issue with nu_refinement. I will try again with an even bigger multiplier.
edit4:
Last test with 60x Multiplier.
#SBATCH --mem=1440G
and again
DIE: allocate: out of memory (reservation insufficient)
At this point, I am clueless.