Ab initio (HR-HAIR) is taking long (not using GPUs?)

I am performing a 3-class Ab Initio reconstruction on ~200k particles with the “HR-HAIR” method proposed by @olibclarke (https://www.biorxiv.org/content/10.1101/2025.09.08.674935v1). It’s giving great results. (Shoutout to Clarke Lab :slight_smile:) I’m having some compute issues however. The expected time to complete the job (~16k iterations) is about 9 days, which seems off.

When I run nvidia-smi command within the GPU that is running my Ab Initio job, I noticed that the GPU does not seem to be utilized. I noticed that other people have noticed this strange GPU allocation behavior (1, 2). I’ve increased the number of CPUs and RAM requested, and the job runs fine. However, it’s taking too long to run to complete. (Maximum request time for a GPU is only 7 days.) Is there a fix to this? Thank you all for the help.

+-----------------------------------------------------------------------------------------+

| NVIDIA-SMI 550.163.01             Driver Version: 550.163.01     CUDA Version: 12.4     |

|-----------------------------------------+------------------------+----------------------+

| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |

| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |

|                                         |                        |               MIG M. |

|=========================================+========================+======================|

|   0  NVIDIA A40                     On  |   00000000:46:00.0 Off |                    0 |

|  0%   40C    P0             71W /  300W |     275MiB /  46068MiB |      0%      Default |

|                                         |                        |                  N/A |

+-----------------------------------------+------------------------+----------------------+

                                                                                         

+-----------------------------------------------------------------------------------------+

| Processes:                                                                              |

|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |

|        ID   ID                                                               Usage      |

|=========================================================================================|

|    0   N/A  N/A     52805      C   python                                        266MiB |

+-----------------------------------------------------------------------------------------+

@MetabolicNerd What version of CryoSPARC do you use? (I’m asking so I can propose the correct command to extract some relevant job information.)

1 Like

Hi @wtempel, thank you for any suggestions. I’m currently using CryoSPARC v4.7.1.

In this case, please can you post the outputs of these commands

project_uid="P99" # replace with actual id
job_uid="J999" # replace with actual id
cryosparcm cli  "get_job('$project_uid', '$job_uid', 'type', 'version', 'params_spec', 'instance_information')"
cryosparcm eventlog $project_uid $job_uid | head -n 40

Sure thing, thank you. Here are the outputs:

[user@node14 ~] (job 12345678) $ cryosparcm cli  “get_job(‘$project_uid’, ‘$job_uid’, ‘type’, ‘version’, ‘params_spec’, ‘instance_information’)”
{‘_id’: ‘69f9518c4ace2ea289c26fba’, ‘instance_information’: {‘CUDA_version’: ‘11.8’, ‘available_memory’: ‘225.77GB’, ‘cpu_model’: ‘AMD EPYC 7543P 32-Core Processor’, ‘driver_version’: ‘12.4’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 47608692736, ‘name’: ‘NVIDIA A40’, ‘pcie’: ‘0000:46:00’}], ‘ofd_hard_limit’: 131072, ‘ofd_soft_limit’: 131072, ‘physical_cores’: 32, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘node02.int’, ‘platform_release’: ‘3.10.0-1160.139.1.el7.tuxcare.els4.x86_64’, ‘platform_version’: ‘#1 SMP Wed Jan 7 16:57:28 UTC 2026’, ‘total_memory’: ‘251.42GB’, ‘used_memory’: ‘22.98GB’}, ‘params_spec’: {‘abinit_K’: {‘value’: 3}, ‘abinit_center’: {‘value’: False}, ‘abinit_init_res’: {‘value’: 5}, ‘abinit_max_res’: {‘value’: 2.3}, ‘abinit_minisize’: {‘value’: 1000}, ‘abinit_minisize_init’: {‘value’: 300}, ‘abinit_radwn_step’: {‘value’: 0.005}}, ‘project_uid’: ‘P1’, ‘type’: ‘homo_abinit’, ‘uid’: ‘J353’, ‘version’: ‘v4.7.1’}
[user@node14 ~] (job 12345678) $ cryosparcm eventlog $project_uid $job_uid | head -n 40
[Wed, 06 May 2026 00:40:35 GMT]  License is valid.
[Wed, 06 May 2026 00:40:35 GMT]  Launching job on lane CLUSTER_OWNERS_GPU target CLUSTER_OWNERS_GPU …
[Wed, 06 May 2026 00:40:35 GMT]  Launching job on cluster CLUSTER_OWNERS_GPU
[Wed, 06 May 2026 00:40:35 GMT]
====================== Cluster submission script: ========================

#!/usr/bin/bash

=============================================+===============

CryoSPARC SLURM Submission Script (GPU Nodes)

Configured for GPU node requests on cluster’s “owners” queue

Author: Lab Admin

Last updated: 2025-07-08

======================================================+======

Partition (queue): normal, gpu, owners, labqueue

#SBATCH --partition=owners

Resources: Nodes and CPUs

#SBATCH --nodes=1
#SBATCH --ntasks=6

Runtime limit (HH:MM:SS)

Note: Maximum runtime for “owners” queue is 48 hours

#SBATCH --time=48:00:00

GPU resources

#SBATCH --gres=gpu:1
#SBATCH --constraint=“GPU_CC:8.6”
#SBATCH --gpu_cmode=shared  # Options: shared (default NVIDIA), exclusive (cluster default), prohibited

Memory request (GB)

Adjust memory using “ram_gb_multiplier” if needed

#SBATCH --mem=48G

Output and error files

#SBATCH --output=/scratch/groups/labname/cryoem/PROJECT01/cryosparc/CS-project01/J353/job.out
#SBATCH --error=/scratch/groups/labname/cryoem/PROJECT01/cryosparc/CS-project01/J353/job.err

Job name

Traceback (most recent call last):
File “”, line 9, in 
BrokenPipeError: [Errno 32] Broken pipe

Sorry, forgot to tag you. Thank you again for the guidance :folded_hands:

Jumping on this to say that the designated job in v5 (ab-initio refinement) is also almost paralyzingly slow, and I’m not sure if that is by design or if something is wrong. I’m running two jobs now with default parameters, 1 class (since that’s the only option), 145k particles in one and 245k particles in another. Both have been running for 6 days and are almost exactly halfway there, so each will take an estimated 13 days to complete on our A4500s (:scream:).

GPUs do seem to be utilized in this case, so just wondering – is this typical behavior you see in your benchmarking? I started another job three days ago and it didn’t even make it past the initialization step in the three days, so I just killed it (this happened twice, so I think is consistent behavior). The other two jobs are marching along, but so slowly that I may forget about them before they finally finish! I know this protocol isn’t the intended use of CS, so am happy to be patient if that is the solution, but seeing as we now have a specific job for it, I was hoping it might be a bit faster, even with the half-sets addition.

Thanks!

1 Like

It takes a lot of time. Its not unusual for these jobs to take several days.

Got it, thanks! I would characterize 2 weeks as a bit more than “several days”, but if that is the expected behavior, it’s good for us to know.

Have you done any benchmarking around how it varies with number of particles? It seems to be exactly the same with 150k and 250k particles, but I shudder to think about how long it would take with a larger data set if it does increase substantially with a number of particles that might be found in a Krios data set for example. We have monthly computer restarts and at this point already have to start these jobs only in the first half of the month so they don’t hit the restart period.

We are excited that the functionality exists and appreciate the half-sets for scientific rigor even though I am sure that increases run time / complexity.

To add onto this, I found it interesting that the number of particles also did not correlate with run time for me. I did a job with 210k particles vs 800k particles with 3 classes, and the run times were approximately the same (expected around ~2 weeks). We’re also having issues because our maximum queue times for our HPC clusters is currently set at 7 days, so we’re trying to find any alternatives that could help.

1 Like

Two weeks is quite long but I have seen some take that long. It depends a lot on the number of particles, classes and box size we find. We have done it on 1-2 mil particles but typically we try to reduce stack with het refine before trying these jobs.

Our particular situation is for the ab-initio refine job (new in v5 so that you can actually have GS half-sets), not the standard ab-initio reconstruction job. So there is no option for anything except a single class. So we are finding 2 weeks for 1 class, 150k particles, which is partially why this seems somewhat alarming.

I know the original question was for the ab-initio reconstruction, but I had jumped on since the refine job came out of the original HR-HAIR protocol. Perhaps just confused things unnecessarily though if there isn’t the same underlying issue.

Yes that seems to be very long run time on any modern GPU.

Agreed, and we are running this only on our A4500s, skipping over the older 1080Tis which probably would take even longer.