Out of memory in large datasets

bsobol · April 6, 2022, 9:11am

Hello,
We are facing issues with out-of-memory errors in various job types.
The ad hoc solution we apply is to modify the build.py scripts for those job types.

I wonder if it would be possible to add a configurable option for the amount of memory, just like it is for CPU cores or the number of GPUs, or some other solution that doesn’t require changing the cryoSPARC codebase in future releases?

leetleyang · April 6, 2022, 11:19am

Hi,

Are you referring to cluster submissions? If so, have you tried creating a bespoke lane incorporating a multiplier with the the ram_gb variable in cluster_script.sh?

e.g.

#SBATCH --mem={{ ram_gb*2 }}G

Cheers,
Yang

bsobol · April 6, 2022, 12:14pm

Yes, I’m referring to cluster submissions and we are aware that we can do it this way, but this solution also has its downsides, especially in computing centre with multiple users having different needs.

It would be really great to have it configurable during the job at job building step.

bsobol · September 15, 2022, 11:01am

@team I would suggest a very simple solution for this kind of issues.

If we have a job_type variable available in the submission script, we would be able to adjust the required resources on the job_type level and not on the lane level, which is drastically more convenient for both users and admins (of course having this configurable in web interface would be ideal, but I understand that it would require more development).

This is a single-line modification (3.3.2-220824):

--- cryosparc_master/cryosparc_command/command_core/__init__.py.orig
+++ cryosparc_master/cryosparc_command/command_core/__init__.py	
@@ -2289,2 +2289,3 @@
                     'cryosparc_username' : job_username,
+                    'job_type' : job_doc['job_type'],
                 }

wtempel · September 15, 2022, 2:00pm

One way of accommodating those different needs would be to allow users to choose from a number of cluster lanes with independently specified ram_gb multipliers.

wtempel · September 15, 2022, 2:06pm

Please can you illustrate how you would use the proposed job_type variable inside cluster_script.sh?

bsobol · September 15, 2022, 2:48pm

Something like

{% if  job_type == 'topaz_train' %}                                                                                                                                                               
    {% set mem_size = ram_gb*2 %}
{% endif %}
[...]
#SBATCH --mem={{ mem_size }}GB

or to conditionally choose different slurm partition etc.

Users often don’t know (and don’t want to know and remember) which job will need more memory, on which lane and partition it’s the best to run etc. With this I can make simple jinja-based logic in default cluster-wide lane covering 90% of most common cases and hide all this from users, save their and my time.

wtempel · December 12, 2022, 9:28pm

@bsobol CryoSPARC version 4.1 includes additional cluster configuration options.