Live vs "normal" Cryosparc 2D classes?

Hi @team,

How (if at all) does the algorithm in the “streaming 2D” job differ from the one used in regular 2D classification in CryoSPARC?

In our hands, the “normal” 2D classification gives dramatically better results in certain cases, particularly for small membrane proteins, even when the relevant 2D classification parameters and number of particles and classes are identical.

It also seems like it ignores certain custom parameters - for example, when I set the maximum alignment resolution to 20 Å in Live, the results are not what I expect - it looks the same as without, where as in regular 2D we see much smoother classes, which can be useful in certain cases.

Cheers, Oli

Hey @olibclarke,

When streaming 2D class starts, it takes in a maximum of (batch size per class * number of classes * 10) particles and runs a normal 2D class job with the given params (number of O-EM/full iterations, classification uncertainty factor, noise model annealing, etc) to initialize the classes. Then for the rest, it runs a full iteration with all the particles every time at least (batch size per class * number of classes * 2) particles are streamed in.

So if there were only a few particles extracted when streaming 2D class starts, the classes and noise model becomes really over-fit - normal 2D class would have seen (batch size per class * number of classes * number of O-EM iterations) particles as it annealed the noise model, while streaming 2dclass fully anneals the noise model over only the initial particles. Also, instead of running one big full iteration with the rest of the particles, streaming 2D class runs many progressively larger full iterations as particles are streamed in.

We’ll change the 10 in the max particles for initialization to number of O-EM iterations, so that if you (re)start streaming 2D class with enough particles it should behave the same way as normal 2D class for the O-EM iterations. Also streaming 2D class does currently ignore max alignment resolution (as well as use white noise model and skip CTF correction) and we’ll implement those params as well :slight_smile:

Thanks,
Kelly

3 Likes

We’ve been feeling a somewhat opposite problem: We find that Live 2D classification reveals many more alternative views of the particle - I’m reluctant to call them rare views because they can be quite populous in Live distributions, but seem rare and get hidden in the non-Live 2D classification distributions.

It seems concerning that the algorithms yield different results, since good reconstructions depend on more views, and users might miss these by not running Live for many datasets. For what it’s worth, we see a similar phenomenon now that RELION 4 uses an algorithm that 2D classifies faster and puts out more “fuzzy blobs” and hides rare views (fast subsets in RELION 3 had a similar but not as bad effect). The field seems to be going in the direction of fast and dirty 2D classification, and maybe Live gives a more slow and careful sorting, unintentionally? Any advice about workarounds that bring the normal and Live methods in sync and make sure normal mode doesn’t hide views?

1 Like

For us, when we have had this issue, we sometimes notice that the “rare” views are present in early iterations, but disappear by the end of classification. In this case playing around with the noise model parameters can be helpful, see here: Limiting the number of particles per 2D class - #7 by olibclarke

1 Like

Live 2D seems to be a bit erratic regarding masking. If I don’t set a mask, sometimes it defaults to a really, really tight one (apparently influenced by the maximum particle diameter size used for blob autopicking) but if I stop a run and restart with a different number of classes, it looks like the 2D classes are just windowed, rather than with a defined diameter mask. If I attempt to define a mask diameter it has always appeared to ignore it.

Whether we get better results from Live or “normal” 2D classification seems dataset dependent - for low-symmetry particles Live seems better (at least if starting with >10,000 particles into ~80-100 classes) but for particles which show orientation preference it can be 50/50.

I think the “quick and dirty” classification issue is driven by the desire for high throughput, which is mostly a demand of industry…?

Quick update for future searchers, and thanks @olibclarke for pointing me in this direction. Interestingly, I did not find that sigma off improved the prevalence of rare views. But in the course of testing it I found that increasing the number of iterations from 20 to 40, the final full iterations to 5, and doubling the batch size to 200, helped very much. (Some lab colleagues have even been running batch sizes 400 and 800 but I haven’t seen it help past 200.)

Another important change was to stop carrying forward all the fuzzy junk classes that I call “Gaussian blobs”. I thought rare views were hiding in these, but I did some tests with multiple rounds of selection and classification to prove otherwise.

Increasing iterations and batch size across multiple rounds of classification has been the key for several projects in our lab.

2 Likes

Funnily enough we also do this too in many cases! :grinning:

Often we use 40 iterations with extra full iterations, sometimes 80 or more iterations for really small, low SNR particles

Hi all, just following up on Kelly’s look into this back in February:

We have made the following fixes to Streaming 2D classification as of CryoSPARC v4.4, in line with what Kelly described above:

  • Streaming 2D classification now behaves the same way as 2D classification in terms of number of particles seen per online-EM iteration (it sees the same number of particles as regular 2D classification. This is equal to batch size per class * number of classes * number of online-EM iterations)
  • The Maximum alignment res (A) parameter was previously ignored, and the value from the Maximum resolution (A) parameter was used instead. Now, the Maximum alignment res (A) parameter is used as it should be. The alignment resolution defaults only to the Maximum resolution if it was set to None or set to a higher resolution number than Maximum resolution
  • The Use white noise model parameter was previously ignored, and a coloured noise model was always used. Now, it is respected and a white noise model will be used if this is enabled.
  • The “Use clamp-solvent to solve 2D classes” parameter was hidden from Streaming 2D classification, as it was ignored previously (reconstruction was always done via standard backprojection)
  • The “Do CTF correction” parameter was also removed from Streaming 2D classification. In standard 2D classification this can be disabled to turn off CTF correction for negative stain data. However, Streaming 2D classification always does CTF correction.

We apologize for any previously confusing behaviour these parameters have resulted in for versions of CryoSPARC before v4.4!

Michael

2 Likes