Insufficient ram: 2GB available but 16GB requested. Job will continue, but may fail if the system runs out of memory

I installed cryoSPARC on a high performance computing (HPC) cluster. However, when I try to launch patch motion correction, I encounter the following error message with all micrographs incomplete.

Job J8 Started
[CPU:   90.5 MB  Avail:   2.14 GB]

Master running v4.7.1, worker running v4.7.1
[CPU:   90.8 MB  Avail:   2.14 GB]

Working in directory: /mnt/.../cryosparc/CryoEM_data/CS-test/J8
[CPU:   90.8 MB  Avail:   2.14 GB]

Running on lane default
[CPU:   90.8 MB  Avail:   2.14 GB]

Resources allocated: 
[CPU:   90.8 MB  Avail:   2.14 GB]

Worker:  ****.research.chop.edu
[CPU:   90.8 MB  Avail:   2.14 GB]

  CPU   :  [0, 1, 2, 3, 4, 5]
[CPU:   90.8 MB  Avail:   2.14 GB]

  GPU   :  [0]
[CPU:   90.8 MB  Avail:   2.14 GB]

  RAM   :  [0, 1]
[CPU:   90.8 MB  Avail:   2.14 GB]

  SSD   :  False
[CPU:   90.8 MB  Avail:   2.14 GB]

--------------------------------------------------------------
[CPU:   90.8 MB  Avail:   2.14 GB]

Insufficient ram: 2GB available but 16GB requested. Job will continue, but may fail if the system runs out of memory.
[CPU:   90.8 MB  Avail:   2.14 GB]

Importing job module for job type patch_motion_correction_multi...
[CPU:  250.2 MB  Avail:   2.07 GB]

Job ready to run
[CPU:  250.2 MB  Avail:   2.07 GB]

***************************************************************
[CPU:  250.2 MB  Avail:   2.07 GB]

Transparent hugepages are enabled. You may encounter stalls or performance problems with CryoSPARC jobs.
[CPU:  250.2 MB  Avail:   2.07 GB]

Job will process this many movies:  20
[CPU:  250.2 MB  Avail:   2.07 GB]

Job will output denoiser training data for this many movies:  20
[CPU:  250.2 MB  Avail:   2.07 GB]

Random seed: 1977843023
[CPU:  250.2 MB  Avail:   2.07 GB]

parent process is 337458
[CPU:  167.8 MB  Avail:   2.05 GB]

Calling CUDA init from 337661
[CPU:  252.8 MB  Avail:   2.06 GB]

Child process with PID 337661 terminated unexpectedly with exit code 1.
[CPU:  252.8 MB  Avail:   2.06 GB]

['uid', 'movie_blob/path', 'movie_blob/shape', 'movie_blob/psize_A', 'movie_blob/is_gain_corrected', 'movie_blob/format', 'movie_blob/has_defect_file', 'movie_blob/import_sig', 'micrograph_blob/path', 'micrograph_blob/idx', 'micrograph_blob/shape', 'micrograph_blob/psize_A', 'micrograph_blob/format', 'micrograph_blob/is_background_subtracted', 'micrograph_blob/vmin', 'micrograph_blob/vmax', 'micrograph_blob/import_sig', 'micrograph_blob_non_dw/path', 'micrograph_blob_non_dw/idx', 'micrograph_blob_non_dw/shape', 'micrograph_blob_non_dw/psize_A', 'micrograph_blob_non_dw/format', 'micrograph_blob_non_dw/is_background_subtracted', 'micrograph_blob_non_dw/vmin', 'micrograph_blob_non_dw/vmax', 'micrograph_blob_non_dw/import_sig', 'micrograph_blob_non_dw_AB/path', 'micrograph_blob_non_dw_AB/idx', 'micrograph_blob_non_dw_AB/shape', 'micrograph_blob_non_dw_AB/psize_A', 'micrograph_blob_non_dw_AB/format', 'micrograph_blob_non_dw_AB/is_background_subtracted', 'micrograph_blob_non_dw_AB/vmin', 'micrograph_blob_non_dw_AB/vmax', 'micrograph_blob_non_dw_AB/import_sig', 'micrograph_thumbnail_blob_1x/path', 'micrograph_thumbnail_blob_1x/idx', 'micrograph_thumbnail_blob_1x/shape', 'micrograph_thumbnail_blob_1x/format', 'micrograph_thumbnail_blob_1x/binfactor', 'micrograph_thumbnail_blob_1x/micrograph_path', 'micrograph_thumbnail_blob_1x/vmin', 'micrograph_thumbnail_blob_1x/vmax', 'micrograph_thumbnail_blob_2x/path', 'micrograph_thumbnail_blob_2x/idx', 'micrograph_thumbnail_blob_2x/shape', 'micrograph_thumbnail_blob_2x/format', 'micrograph_thumbnail_blob_2x/binfactor', 'micrograph_thumbnail_blob_2x/micrograph_path', 'micrograph_thumbnail_blob_2x/vmin', 'micrograph_thumbnail_blob_2x/vmax', 'background_blob/path', 'background_blob/idx', 'background_blob/binfactor', 'background_blob/shape', 'background_blob/psize_A', 'rigid_motion/type', 'rigid_motion/path', 'rigid_motion/idx', 'rigid_motion/frame_start', 'rigid_motion/frame_end', 'rigid_motion/zero_shift_frame', 'rigid_motion/psize_A', 'spline_motion/type', 'spline_motion/path', 'spline_motion/idx', 'spline_motion/frame_start', 'spline_motion/frame_end', 'spline_motion/zero_shift_frame', 'spline_motion/psize_A']
[CPU:  253.0 MB  Avail:   2.06 GB]

--------------------------------------------------------------
[CPU:  253.0 MB  Avail:   2.06 GB]

Compiling job outputs...
[CPU:  253.0 MB  Avail:   2.06 GB]

Passing through outputs for output group micrographs from input group movies
[CPU:  253.0 MB  Avail:   2.06 GB]

This job outputted results ['micrograph_blob_non_dw', 'micrograph_blob_non_dw_AB', 'micrograph_thumbnail_blob_1x', 'micrograph_thumbnail_blob_2x', 'movie_blob', 'micrograph_blob', 'background_blob', 'rigid_motion', 'spline_motion']
[CPU:  253.0 MB  Avail:   2.06 GB]

  Loaded output dset with 0 items
[CPU:  253.0 MB  Avail:   2.06 GB]

Passthrough results ['gain_ref_blob', 'mscope_params']
[CPU:  253.0 MB  Avail:   2.06 GB]

  Loaded passthrough dset with 20 items
[CPU:  253.0 MB  Avail:   2.06 GB]

  Intersection of output and passthrough has 0 items
[CPU:  253.0 MB  Avail:   2.06 GB]

  Output dataset contains:  ['mscope_params', 'gain_ref_blob']
[CPU:  253.0 MB  Avail:   2.06 GB]

  Outputting passthrough result gain_ref_blob
[CPU:  253.0 MB  Avail:   2.06 GB]

  Outputting passthrough result mscope_params
[CPU:  253.0 MB  Avail:   2.05 GB]

Passing through outputs for output group micrographs_incomplete from input group movies
[CPU:  253.0 MB  Avail:   2.05 GB]

This job outputted results ['micrograph_blob']
[CPU:  253.0 MB  Avail:   2.05 GB]

  Loaded output dset with 20 items
[CPU:  253.0 MB  Avail:   2.05 GB]

Passthrough results ['movie_blob', 'gain_ref_blob', 'mscope_params']
[CPU:  253.0 MB  Avail:   2.05 GB]

  Loaded passthrough dset with 20 items
[CPU:  253.0 MB  Avail:   2.05 GB]

  Intersection of output and passthrough has 20 items
[CPU:  253.0 MB  Avail:   2.05 GB]

  Output dataset contains:  ['mscope_params', 'movie_blob', 'gain_ref_blob']
[CPU:  253.0 MB  Avail:   2.05 GB]

  Outputting passthrough result movie_blob
[CPU:  253.0 MB  Avail:   2.05 GB]

  Outputting passthrough result gain_ref_blob
[CPU:  253.0 MB  Avail:   2.05 GB]

  Outputting passthrough result mscope_params
[CPU:  253.0 MB  Avail:   2.05 GB]

Checking outputs for output group micrographs
[CPU:  253.0 MB  Avail:   2.05 GB]

Checking outputs for output group micrographs_incomplete
[CPU:  253.3 MB  Avail:   2.05 GB]

Updating job size...
[CPU:  253.5 MB  Avail:   2.05 GB]

Exporting job and creating csg files...
[CPU:  253.5 MB  Avail:   2.05 GB]

***************************************************************
[CPU:  253.5 MB  Avail:   2.05 GB]

Job complete. Total time 30.53s

Here is the cryoSPARC instance information:

$ cryosparcm status
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/mnt/.../cryosparc/cryosparc_master
Current cryoSPARC version: v4.7.1
----------------------------------------------------------------------------

CryoSPARC process status:

app                              RUNNING   pid 337026, uptime 0:07:52
app_api                          RUNNING   pid 337063, uptime 0:07:50
app_api_dev                      STOPPED   Not started
command_core                     RUNNING   pid 336419, uptime 0:08:21
command_rtp                      RUNNING   pid 336501, uptime 0:08:09
command_vis                      RUNNING   pid 336478, uptime 0:08:10
database                         RUNNING   pid 336297, uptime 0:08:29

----------------------------------------------------------------------------
License is valid
----------------------------------------------------------------------------

$ uname -a && free -g
Linux ****.research.chop.edu 5.14.0-162.6.1.el9_1.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 30 07:36:03 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
               total        used        free      shared  buff/cache   available
Mem:               5           3           0           0           2           1
Swap:              7           0           7

Welcome to the forum @omkarshinde.

The current minimum requirement for CryoSPARC worker software is 32 GB of system RAM.

I am trying to run cryoSPARC on a cluster. How do I request 32 GB RAM while running job on cluster? I am currently trying to run cryoSPARC on cluster with cluster_info.json and cluster_script.sh in cryoSPARC working directories. Can you please have a look at the files below.

cluster_info.json

{
“name”: “Master node”,
“worker_bin_path”: “/mnt/isilon/sgourakis_lab_storage/main/cryosparc/cryosparc_worker/bin/cryosparcw”,
“send_cmd_tpl”: “srun --mem=32G {{ script_path }}”,
“qsub_cmd_tpl”: “/mnt/isilon/sgourakis_lab_storage/main/cryosparc/cluster_script.sh”,
“qstat_cmd_tpl”: “/mnt/isilon/sgourakis_lab_storage/main/cryosparc/CryoEM_data/CS-test -j J1”,
“qdel_cmd_tpl”: “/mnt/isilon/sgourakis_lab_storage/main/cryosparc/CryoEM_data/CS-test J1”,
“qinfo_cmd_tpl”: “mnt/isilon/sgourakis_lab_storage/main/cryosparc/CryoEM_data/CS-test”
}

cluster_script.sh

#!/usr/bin/env bash

#SBATCH --p gpuq
#SBATCH --job-name cryosparc
#SBATCH --cpus-per-task=16
#SBATCH --gres=gpu:2
#SBATCH --mem=256G
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --time=4:00:00

cryosparcm start

Thanks for posting the cluster_info.json and cluster_script.sh files. The files’ contents differ significantly from the sample configuration that I am familiar with.
Is your own configuration based on instructions for running CryoSPARC specifically on your institution’s HPC platform?

The configurations are made by me with the help of colleague in the lab. There are no instructions available by institution’s HPC center to run cryoSPARC. Can you please help me with the script? Also for installing cryoSPARC on a cluster, I will follow Master Node CryoSPARC Installation protocol and not the Single Workstation CryoSPARC Installation, correct?

Ask whoever manages the HPC cluster at your institute. Safer to do that, than put a lot of time in and then get a grumpy e-mail from the cluster team about misconfiguring something! :slight_smile:

1 Like

That depends on how you plan to run CryoSPARC jobs on the cluster. Alternative approaches could include:

  • Alternative A: a CryoSPARC master installation that operates outside SLURM and submits jobs to SLURM, where CryoSPARC worker software is available on the compute nodes. In this case one would
    1. coordinate with HPC support to identify a suitable server that can support the CryoSPARC master workload, including interactive jobs, outside SLURM. The server, as well as the compute nodes, would need access to the prerequisite common Linux identity and shared filesystem(s) for CryoSPARC project directories and raw data.
    2. perform a Master Node Only installation and a separate installation of the CryoSPARC worker package, with the installed worker package possibly being shared between multiple compute nodes.
    3. use the cryosparcm cluster connect command to create a cluster scheduler lane inside CryoSPARC
  • Alternative B: A single workstation-style CryoSPARC instance is installed and operated within the confines of a SLURM job. This approach may be simpler to implement initially, but managing the life cycle of the CryoSPARC instance (and database) “between” SLURM jobs can be a complex task (example). Moreover, SLURM task management may abruptly terminate CryoSPARC master processes, increasing the risk of corruption to the CryoSPARC database.

These are not the only possible approaches. As rbs_sci already suggested, you may want to consult with your HPC support. You may want to share with them the architectural overview, prerequisites and installation instructions and post in this forum any questions that arise.

1 Like