Memory usage SLURM

Dear cryoSPARC team,

Again another SLURM related post from me. Sorry! :wink:

CryoSPARC seems to have predefined setting for different jobs. e.g. “Homogeneous Refinement (NEW!)” always set “{{ ram_gb }}” to 24GB.
Depending on the jobs box/pixel size and final resolution, this is not always enough and it becomes a problem then you have a SLURM setup as I do, where cgroups is enabled. cgroups works as a resource “jail” where a job cannot go beyond it’s allocated resources. Super smart and it makes all resources on processing nodes much more modular.
Thus, a job submitted via cryoSPARC, that requires more RAM than was allocated upon submission, will crash when it reaches the max allowed RAM usage.
Therefore I have made some extra lanes which allocates additional 8 and 16 GB of RAM to jobs.
It is a workable workaround, but not ideal, as users submits many jobs that crashes due to lack of RAM.

Could it be possible to make a more accurate and dynamic estimate on RAM usage?
I guess so, since it should be possible to calculate usage from pixel size, box size and nyquist from each job.
It would be a big relief if one could submit cryoSPARC jobs where allocated resources were more precisely calculated before submission.

//Jesper

+1 for this.

I ended up having to double the memory requested for jobs like Local Refinement to overcome this.

Hi @jelka,

Thanks for reporting this. This is on our radar, and we’ll hopefully be able to re-profile our jobs soon. For the time being, the best way to get around this is to allocate additional memory manually. Sorry for the inconvenience!

@stephan, do you have any more details or time frame for this type of enhancement?

We have a user trying to do a large 3D helical refinement. The job type seems to default to 48GB ram_gb, which we triple in our slurm job submission template (to 144GB), but the job ultimately used 383GB before crashing on a numpy fftw memory allocation failure. We don’t really want to increased the slurm memory multiplier for all jobs, as this doesn’t seem necessary. But being able to do so for some job types and not others would be nice.

Thanks,
-Andrew

Would also like to inquire about this. I have some particles that I would like to down sample, but it appears that their box size is too large and the down sampling job is crashing once it runs out of memory (default is 16 GB for this job type).

I would also like to see a more accurate memory calculation. I also want to say that large box reconstructions require much more ram than I would expect. I have a 640px size NU reconstruction that requires at least 160GB of ram. Relion on the other hand does not really require this much ram for the same particle set reconstruction. 160GB ram for 1GPU is quite a bit restrictive to use other 3GPUs. It would be beneficial if the jobs can use less ram if possible.