Poor 2d class average

cbeck · November 22, 2024, 4:28pm

I’m glad to hear that the 2D classification improved and that you can get a good ab inito! Can you expand on what you mean by the results of heterogeneous refinement not being good? How many classes are you using, and what are you using for the reference volumes? Is heterogeneous refinement not pulling out junk particles effectively? You might want to refer to this tutorial page written by user olibclarke on how to set up decoy classification with heterogeneous refinement: Case Study: Exploratory data processing by Oliver Clarke | CryoSPARC Guide

To summarize, set up a heterogeneous refinement job with your good ab initio and 3-8 “decoy” classes. The decoy classes are meant to absorb junk particles. My preferred way to generate decoy classes is to start an ab initio job with ~12 classes, then kill the job as soon as the first iteration finishes, which results in 12 classes of random noise. If you mark the job as complete, then you can use these classes as decoys for heterogeneous refinement.

Once the heterogeneous refinement finishes, take only the particles that went into the good class and do another round of decoy classification to pull out more junk. I repeat this process until the junk particles make up less than <5% of the total particle stack. I then perform another round of 2D classification, because now that you’ve removed most of the junk, you’ll get more diverse classes of your particle and potentially identify rare views.

The above strategy I described is the “basic” version. I sometimes make the following modifications:

If the “good” class after a round of heterogeneous refinement starts to look noisy or has overfitting artifacts, it can help to redo ab initio on this cleaner stack of particles to generate a better reference
Even though the decoy reference are mostly random noise, I’ve noticed that a given decoy reference tends to produce similar-looking volumes at the end of the refinement. To pull out as much diverse junk as possible, I usually rotate through different decoy references with each iteration of heterogeneous refinement
If you want to be really gentle with pulling out junk (e.g. if you’re worried about throwing away rare views in the junk classes), you can set up multiple clones of the same heterogeneous refinement job in parallel (one for each GPU on your workstation). After they finish, you can pool all of the particles that went into at least one of the good classes and use them for the next iteration of heterogeneous refinement. This method is more conservative and will only throw out a particle if it was assigned to a junk class multiple times. However, using multiple GPUs simultaneously at once tends to significantly slow down each individual job.

Edit to add: Are you binning (Fourier cropping) the particle images? Especially for this particle cleanup stage where you don’t need high-resolution data, it can help to bin the images by a factor of two or more. For example, for a box size of 360, I usually bin to 120, which results in a Nyquist frequency of ~6 A for my micrograph’s pixel size. This significantly speeds up processing time and helps when experimenting with different parameters.

I hope this helps!