Hello,
We are facing issues with out-of-memory errors in various job types.
The ad hoc solution we apply is to modify the build.py scripts for those job types.
I wonder if it would be possible to add a configurable option for the amount of memory, just like it is for CPU cores or the number of GPUs, or some other solution that doesn’t require changing the cryoSPARC codebase in future releases?
Are you referring to cluster submissions? If so, have you tried creating a bespoke lane incorporating a multiplier with the the ram_gb variable in cluster_script.sh?
Yes, I’m referring to cluster submissions and we are aware that we can do it this way, but this solution also has its downsides, especially in computing centre with multiple users having different needs.
It would be really great to have it configurable during the job at job building step.
@team I would suggest a very simple solution for this kind of issues.
If we have a job_type variable available in the submission script, we would be able to adjust the required resources on the job_type level and not on the lane level, which is drastically more convenient for both users and admins (of course having this configurable in web interface would be ideal, but I understand that it would require more development).
This is a single-line modification (3.3.2-220824):
One way of accommodating those different needs would be to allow users to choose from a number of cluster lanes with independently specified ram_gb multipliers.
{% if job_type == 'topaz_train' %}
{% set mem_size = ram_gb*2 %}
{% endif %}
[...]
#SBATCH --mem={{ mem_size }}GB
or to conditionally choose different slurm partition etc.
Users often don’t know (and don’t want to know and remember) which job will need more memory, on which lane and partition it’s the best to run etc. With this I can make simple jinja-based logic in default cluster-wide lane covering 90% of most common cases and hide all this from users, save their and my time.