Today I would like to ask a question on your strategy involving the use of 3D Classification for peeling may conformations from a dataset of a wildly active protein.
I have been using 3D Classification on 800k particles (selected by many rounds of 2D classification to remove junk particles) and split into 4 classes consisting of 180k, 250k, 198k, 203k particles each. Then, I realize that each class can be split into 2-3 more classes. For one of those subclasses consisting of 40k particles, when I tried 3D Classification again into 2 classes, I got back mostly 39k particles class and another class of junk particles. This I would consider as “exhaustive classification”, where I cannot split 1 class further.
The reason I have been more actively using 3D Classification is because previously we ran the 3D Classification job on our old workstation and it took more than 1 week to finish. Then when we migrated to a faster workstation, we were able to run the job in 2 days, and smaller set of particles in less than 2 hours. Therefore, although I have been building models on the maps I got from the initial 4-class Classification job, I have been actively doing more 3D Classification and discover smaller and more intricate movements of my protein, that got averaged (and lower quality) in the first 4 maps.
My question for our community is, if it is better to do 3D Classification until exhaustive (cannot split anymore, or split into homogenous classes, confirm by NU-Refine and look at the map closely), or is there a better way to classify the highly dynamic states of the protein that got captured in CryoEM? I tried 3D-Variability but that has not really helped me separate the classes.
Moreover, should I be doing “double confirm Classification” where I run the 3D class job again with similar parameters to see if I get the same results? I found one way to ensure the 3D classification job get easily reproducible results is by decreasing the Convergence criterion (%) to much lower numbers, like 0.001, and increase the max F-EM rounds (up to 50-100) to get a very stable classes without much particles shifting between each round of classification. But this would only work on very fast workstations.
That’s an interesting topic. I think the strategy you described could be successful, especially with high-conformational or dynamical protein.
I need some insights though, because the 3D classification in cryoSPARC requires that you provide as many model to match the number of classes you want, does this strategy could introduce a bias related to this ? Unlike Relion which produce as many classes you want from particles and only 1 reference map.
All my attempts to distinguish conformational states with 3D classification job were unsuccessful, perhaps because my input models are too close from each other, and as a result I always got classes that looked exactly the same (input particles could produce a map at 3-3.5 Å with NU refinement). What type of initialization mode do you use during 3D classification job in cryoSPARC?
By my side, I use ab-initio jobs to separate the different conformational states, especially at low resolution. Basically after that I unbinned progressively my particles until they reached nyquist, and I sort those particles with heterogeneous refinement jobs. Multibody 3D refinement algorithm from relion could produce nice result to assess dynamical states, but it has some limits, especially regarding the size of your protein/complex.
Thank you for your input and experience! For this type of classification, I usually run the 3D Classification without any Initial model (Initialization mode: simple) so as to let the Classification job run as random as possible, hoping to avoid any input biases.
So far, for some group of particles, I was able to get two distinct classes with number of particles consistent across different runs of the same Classification job. Sort of like getting a triplicate: when the number of particles across all classes are the same with exactly similar parameters (clone job), then I consider them being “true” conformers of each other. For some Classification jobs, the number of particles per group keep shifting ~1k across different jobs, so I tried reducing the number of classes and it turned out I was “forcing” it into too many classes while in reality 2 of the classes the particles could be together in one class.
Very much depends what you are looking for (the scale of the heterogeneity, the resolution at which it becomes apparent)
But in any case you will want to run a refinement of some sort first, as 3D classification uses the input alignments, even if you downsample the particles prior to classification (which may be advantageous for reasons of speed)
As often, the answer is “it depends”. It depends primarily on which type of heterogeneity you are facing. If it is purely discrete heterogeneity (compositional heterogeneity is always discrete, conformational can be discrete), and at a scale that allows discriminating classes, then heterogeneous refinement and/or 3D classification should in principle let you completely resolve it. But when faced with continuous conformational heterogeneity, classification approaches won’t work because they would need an infinite number of classes to model the data correctly. This of course breaks down because in such cases, trying to classify exhaustively will only lead to more and more classes, less and less populated every round. Map quality will improve with the first few rounds because the most different conformations start separating into different classes, but eventually map quality will degrade as the number of particles per class decreases below a usable number and there is no longer enough accumulated signal to get a good reconstruction.
Continuous heterogeneity is a difficult problem, and in practice you often encounter all kinds of heterogeneity (discrete and continuous), so you need to address all of them either one by one (for instance, separating different species by heterogeneous refinement and/or 3D classification, then resolving conformations of each single species using 3DVA or FlexRefine) or all at once (cryoDRGN is good at doing this!).
I worked on a case like this a couple years ago, for which exhaustive classification was leading nowhere. What eventually worked was 3DVA and cryoDRGN. It’s here if you’re interested in reading about it: https://doi.org/10.7554/eLife.71420