Multiple GPU and thread usage for ab initio reconstruction

Hello,
I am using Cryosparc version 4.2.1. I have a very large number of particles that I’m reconstructing into 3 ab initio classes. I’ve left all other parameters at default besides disabling SSD cache and increasing the number of classes.
My ab initio reconstruction has been running for quite a long time ~12 hours now and only a third of the particles have been used on 1 GPU.
Is there a way to make this job run faster without using fewer particles and fewer classes?
There isn’t an option to use multiple GPUs or multiple threads for this job type. I see that using multiple GPUs and threads was suggested a few years ago, but I’m just wondering if there’s a way to make this job run quicker on the current version.

Thanks!

Is your SSD cache too small? You should only disable it if really necessary. Also - are your particles downsampled? Ab initio (by default) is going to stop at 12 A so there’s no reason to have a Nyquist higher than that at this stage. If you are using typical pixel sizes, just under 1 A/px, I recommend you use a 4-6x downsample in extraction. E.g. extract a 432px box with Fourier cropping to 72px (good numbers for a ~300 A particle at ~0.85 A/px). Using a smaller particle box will make it much faster.

2 Likes

I’ve found multi-class ab initio to be really slow with CS4.2… if I can see obvious heterogeneity in a sample (e.g.: co-purification, obviously different complexes) I can pull things out into different 2D sets and run multiple single-class ab initio runs in <30 minutes, while a multi-class ab initio is still going 24 hours later.

I don’t bin down quite so hard, but it really depends on the dataset; I like to have a sampling limit ~6-8 A, although for really big things I’ll dump it down toward 20 A.

Hi, @DanielAsarnow, @rbs_sci
Yes I have very little memory on the SSD cache.
No, I didn’t down-sample my particles, I didn’t anticipate the ab initio job run time.
Thank you both, seems downsampling is the solution
I’ll downsample and rerun.

Thanks!

I’ve found multi-class ab initio to be really slow with CS4.2… if I can see obvious heterogeneity in a sample (e.g.: co-purification, obviously different complexes) I can pull things out into different 2D sets and run multiple single-class ab initio runs in <30 minutes, while a multi-class ab initio is still going 24 hours later.

Re-bumping this helpful thread to follow up on the quoted note above – @rbs_sci can you supply a bit more information about this behaviour? How large is the full particle set you’re using for multi-class ab-initio? How many classes? Was this faster in versions pre 4.2? Thanks!

Hi @vperetroukhin

Full particle set is ~500,000 particles. Originally 768x768 but binned to 256x256. I saw the behaviour with both 6 and 10 classes (10 was obviously overkill). We don’t run SSD caching as our CS systems are all-SSD storage and projects get detached and moved when complete.

I’ve not seen it since the latest patch. I’ll see if I can trigger it again.

1 Like