CPU and Memory Management with SLURM

Here we integrated CryoSparc with the HPC cluster with SLURM. However, many times the memory usage exceeded if the job is not running on a node exclusively. We read the documents but there’s not much discussion on the CPU-core and memory allocations for a job. In the cluster script, it contains several placeholders for SLURM job parameters, such as {{ ram_gb }} and {{ num_cpu }}. Can someone help with the two questions:

(1) How such variables are evaluated when a job is built?

(2) Is there a way the user can choose the size of memory, or CPU core numbers, like what can be done for GPUs?

Thanks!

Hi @yuxing,

(1) How such variables are evaluated when a job is built?

Each jobtype will specify an arbitrary set of resources, which get parsed as cluster_script parameters. If you have a rummage around cryosparc_master/cryosparc_compute/jobs/xxx/*build.py, you’ll find the resources a job will request listed under recompute_resources(job).

e.g.

job.set_resources_needed(4, 1, 16000, params.get('compute_use_ssd', True))

… would translate to 4 CPUs, 1 GPU, 16GB RAM. These are also listed near the start of the joblog.

(2) Is there a way the user can choose the size of memory, or CPU core numbers, like what can be done for GPUs?

Unfortunately, AFAIK, no.

At least where memory requirements are concerned, the @team have communicated in the past that a re-work is on their to-do list. I interpret this to mean either a re-estimation of the values or a refactoring of how memory requirements are calculated, e.g. based partly on input image and stack size?

In the mean time, you could either change these values for specific jobtypes that you’re having trouble with, or apply multipliers to the relevant cluster_script parameters, e.g. {{ ram_gb*2 }}G. The latter being lane-wide. Alternatively, depending on setup and user requirements, it may also be convenient to quantise memory requests as a function of GPUs, e.g. #SBATCH --mem={{ num_gpu*40 }}G.

Cheers,
Yang

With the changes in version 4.1 is it now possible to specify e.g. memory allocation upon queuing?

Hi,

Yes. There are many possible implementations with custom variables. Below is one simple Slurm example.

...
{%- if custom_mem %}
#SBATCH --mem={{ custom_mem }}G
{%- else %}
#SBATCH --mem={{ ram_gb }}G
{%- endif %}
...

In this example, cryoSPARC submits the job normally unless custom_mem is defined. Alternatively, one could also incorporate the custom variable as a multiplier, as illustrated here.

Cheers,
Yang

2 Likes

@yuxing @leetleyang @daniel.s.d.larsson The relevant topic in our guide has recently been updated.

1 Like