Ab initio (HR-HAIR) is taking long (not using GPUs?)

I am performing a 3-class Ab Initio reconstruction on ~200k particles with the “HR-HAIR” method proposed by @olibclarke (https://www.biorxiv.org/content/10.1101/2025.09.08.674935v1). It’s giving great results. (Shoutout to Clarke Lab :slight_smile:) I’m having some compute issues however. The expected time to complete the job (~16k iterations) is about 9 days, which seems off.

When I run nvidia-smi command within the GPU that is running my Ab Initio job, I noticed that the GPU does not seem to be utilized. I noticed that other people have noticed this strange GPU allocation behavior (1, 2). I’ve increased the number of CPUs and RAM requested, and the job runs fine. However, it’s taking too long to run to complete. (Maximum request time for a GPU is only 7 days.) Is there a fix to this? Thank you all for the help.

+-----------------------------------------------------------------------------------------+

| NVIDIA-SMI 550.163.01             Driver Version: 550.163.01     CUDA Version: 12.4     |

|-----------------------------------------+------------------------+----------------------+

| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |

| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |

|                                         |                        |               MIG M. |

|=========================================+========================+======================|

|   0  NVIDIA A40                     On  |   00000000:46:00.0 Off |                    0 |

|  0%   40C    P0             71W /  300W |     275MiB /  46068MiB |      0%      Default |

|                                         |                        |                  N/A |

+-----------------------------------------+------------------------+----------------------+

                                                                                         

+-----------------------------------------------------------------------------------------+

| Processes:                                                                              |

|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |

|        ID   ID                                                               Usage      |

|=========================================================================================|

|    0   N/A  N/A     52805      C   python                                        266MiB |

+-----------------------------------------------------------------------------------------+

@MetabolicNerd What version of CryoSPARC do you use? (I’m asking so I can propose the correct command to extract some relevant job information.)

1 Like

Hi @wtempel, thank you for any suggestions. I’m currently using CryoSPARC v4.7.1.

In this case, please can you post the outputs of these commands

project_uid="P99" # replace with actual id
job_uid="J999" # replace with actual id
cryosparcm cli  "get_job('$project_uid', '$job_uid', 'type', 'version', 'params_spec', 'instance_information')"
cryosparcm eventlog $project_uid $job_uid | head -n 40

Sure thing, thank you. Here are the outputs:

[user@node14 ~] (job 12345678) $ cryosparcm cli  “get_job(‘$project_uid’, ‘$job_uid’, ‘type’, ‘version’, ‘params_spec’, ‘instance_information’)”
{‘_id’: ‘69f9518c4ace2ea289c26fba’, ‘instance_information’: {‘CUDA_version’: ‘11.8’, ‘available_memory’: ‘225.77GB’, ‘cpu_model’: ‘AMD EPYC 7543P 32-Core Processor’, ‘driver_version’: ‘12.4’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 47608692736, ‘name’: ‘NVIDIA A40’, ‘pcie’: ‘0000:46:00’}], ‘ofd_hard_limit’: 131072, ‘ofd_soft_limit’: 131072, ‘physical_cores’: 32, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘node02.int’, ‘platform_release’: ‘3.10.0-1160.139.1.el7.tuxcare.els4.x86_64’, ‘platform_version’: ‘#1 SMP Wed Jan 7 16:57:28 UTC 2026’, ‘total_memory’: ‘251.42GB’, ‘used_memory’: ‘22.98GB’}, ‘params_spec’: {‘abinit_K’: {‘value’: 3}, ‘abinit_center’: {‘value’: False}, ‘abinit_init_res’: {‘value’: 5}, ‘abinit_max_res’: {‘value’: 2.3}, ‘abinit_minisize’: {‘value’: 1000}, ‘abinit_minisize_init’: {‘value’: 300}, ‘abinit_radwn_step’: {‘value’: 0.005}}, ‘project_uid’: ‘P1’, ‘type’: ‘homo_abinit’, ‘uid’: ‘J353’, ‘version’: ‘v4.7.1’}
[user@node14 ~] (job 12345678) $ cryosparcm eventlog $project_uid $job_uid | head -n 40
[Wed, 06 May 2026 00:40:35 GMT]  License is valid.
[Wed, 06 May 2026 00:40:35 GMT]  Launching job on lane CLUSTER_OWNERS_GPU target CLUSTER_OWNERS_GPU …
[Wed, 06 May 2026 00:40:35 GMT]  Launching job on cluster CLUSTER_OWNERS_GPU
[Wed, 06 May 2026 00:40:35 GMT]
====================== Cluster submission script: ========================

#!/usr/bin/bash

=============================================+===============

CryoSPARC SLURM Submission Script (GPU Nodes)

Configured for GPU node requests on cluster’s “owners” queue

Author: Lab Admin

Last updated: 2025-07-08

======================================================+======

Partition (queue): normal, gpu, owners, labqueue

#SBATCH --partition=owners

Resources: Nodes and CPUs

#SBATCH --nodes=1
#SBATCH --ntasks=6

Runtime limit (HH:MM:SS)

Note: Maximum runtime for “owners” queue is 48 hours

#SBATCH --time=48:00:00

GPU resources

#SBATCH --gres=gpu:1
#SBATCH --constraint=“GPU_CC:8.6”
#SBATCH --gpu_cmode=shared  # Options: shared (default NVIDIA), exclusive (cluster default), prohibited

Memory request (GB)

Adjust memory using “ram_gb_multiplier” if needed

#SBATCH --mem=48G

Output and error files

#SBATCH --output=/scratch/groups/labname/cryoem/PROJECT01/cryosparc/CS-project01/J353/job.out
#SBATCH --error=/scratch/groups/labname/cryoem/PROJECT01/cryosparc/CS-project01/J353/job.err

Job name

Traceback (most recent call last):
File “”, line 9, in 
BrokenPipeError: [Errno 32] Broken pipe

Sorry, forgot to tag you. Thank you again for the guidance :folded_hands: