Accelerate heterogeneous refinement with large numbers of particles

parrot · July 27, 2023, 6:55pm

Hi,

We have a heterogeneous sample with 7-10 different populations. We have well-polished ~5 million particles for a heterogeneous refinement. Because the populations are not evenly distributed, we have to increase the Batch size per class to 50000 to get good results. But the consequence is the job took 3-5 days to complete. Is there any way to accelerate the job?

Our workstation has 4x3090 GPUs, 512g ram, 2x4TB ssd.

Thank you for any suggestions!

rbs_sci · July 28, 2023, 3:58am

Beyond Fourier cropping down to a Nyquist of, say, ~4 Angstrom, then using that to educate which particles are in which class and moving forward that way for multiple homogeneous refinements, throwing that many particles at that many population options is always going to take some time.

3-5 days total isn’t so bad. 3 days per iteration is bad.

I’d suggest maybe getting a faster GPU (4090?) but the Ada Lovelace cards seem to have some teething problems with CryoSPARC, IIRC.

parrot · July 28, 2023, 4:28am

Thank you for your suggestions! I am just wondering what is the bottleneck? Is that possible to use multiple GPUs for a single heterogeneous refine (I didn’t see the option)? or it truly relies on the performance of a single GPU, maybe an A100 can notably make it faster?

rbs_sci · July 28, 2023, 5:01am

Multiple GPUs are currently not supported for homogeneous, heterogeneous, local and non-uniform refinement.

I’ve no experience of the A100… maybe? From a (very brief) search, some benchmarks don’t mark the A100 as significantly faster than a 3090 with RELION… almost certainly not worth the cost unless you have them already.

The bottleneck is likely just the sheer number of particles, which have to be tested against all possible orientations for all populations. Increase number of populations and the complexity grows rapidly.

If you haven’t tried already, perhaps a consensus map (all particles into a single homogeneous refinement) followed by 3D classification (which does not change angular assignments) might tease out the different populations (use force hard classification) which can then be refined individually, or possibly then fed to individual heterogeneous refinements, before grouping like structures together and re-refining (I’ve had success with that strategy in the past, although that dataset was processed in RELION rather than CryoSPARC, although the general workflow would translate easily enough…)

parrot · July 28, 2023, 4:58pm

Thank you for your advice! I will try 3D classification.