Today I would like to ask a question on your strategy involving the use of 3D Classification for peeling may conformations from a dataset of a wildly active protein.
I have been using 3D Classification on 800k particles (selected by many rounds of 2D classification to remove junk particles) and split into 4 classes consisting of 180k, 250k, 198k, 203k particles each. Then, I realize that each class can be split into 2-3 more classes. For one of those subclasses consisting of 40k particles, when I tried 3D Classification again into 2 classes, I got back mostly 39k particles class and another class of junk particles. This I would consider as “exhaustive classification”, where I cannot split 1 class further.
The reason I have been more actively using 3D Classification is because previously we ran the 3D Classification job on our old workstation and it took more than 1 week to finish. Then when we migrated to a faster workstation, we were able to run the job in 2 days, and smaller set of particles in less than 2 hours. Therefore, although I have been building models on the maps I got from the initial 4-class Classification job, I have been actively doing more 3D Classification and discover smaller and more intricate movements of my protein, that got averaged (and lower quality) in the first 4 maps.
My question for our community is, if it is better to do 3D Classification until exhaustive (cannot split anymore, or split into homogenous classes, confirm by NU-Refine and look at the map closely), or is there a better way to classify the highly dynamic states of the protein that got captured in CryoEM? I tried 3D-Variability but that has not really helped me separate the classes.
Moreover, should I be doing “double confirm Classification” where I run the 3D class job again with similar parameters to see if I get the same results? I found one way to ensure the 3D classification job get easily reproducible results is by decreasing the Convergence criterion (%) to much lower numbers, like 0.001, and increase the max F-EM rounds (up to 50-100) to get a very stable classes without much particles shifting between each round of classification. But this would only work on very fast workstations.
That’s an interesting topic. I think the strategy you described could be successful, especially with high-conformational or dynamical protein.
I need some insights though, because the 3D classification in cryoSPARC requires that you provide as many model to match the number of classes you want, does this strategy could introduce a bias related to this ? Unlike Relion which produce as many classes you want from particles and only 1 reference map.
All my attempts to distinguish conformational states with 3D classification job were unsuccessful, perhaps because my input models are too close from each other, and as a result I always got classes that looked exactly the same (input particles could produce a map at 3-3.5 Å with NU refinement). What type of initialization mode do you use during 3D classification job in cryoSPARC?
By my side, I use ab-initio jobs to separate the different conformational states, especially at low resolution. Basically after that I unbinned progressively my particles until they reached nyquist, and I sort those particles with heterogeneous refinement jobs. Multibody 3D refinement algorithm from relion could produce nice result to assess dynamical states, but it has some limits, especially regarding the size of your protein/complex.
3D classification does not require input models - by default models will be generated from reconstructions of subsets of the input particle set.
If you are seeing that all output volumes look identical, I would test switching on force hard classification, and also experiment with using different values for the target resolution.
Thank you for your input and experience! For this type of classification, I usually run the 3D Classification without any Initial model (Initialization mode: simple) so as to let the Classification job run as random as possible, hoping to avoid any input biases.
So far, for some group of particles, I was able to get two distinct classes with number of particles consistent across different runs of the same Classification job. Sort of like getting a triplicate: when the number of particles across all classes are the same with exactly similar parameters (clone job), then I consider them being “true” conformers of each other. For some Classification jobs, the number of particles per group keep shifting ~1k across different jobs, so I tried reducing the number of classes and it turned out I was “forcing” it into too many classes while in reality 2 of the classes the particles could be together in one class.
Thanks Oli and Khoa for your experience, appreciate it!
Do you usually ran 3D classification with low-resolution particles (let’s say 3 or 4 Å/pix), or more likely with high resolution particles (after NU-refinement for instance) ?
Very much depends what you are looking for (the scale of the heterogeneity, the resolution at which it becomes apparent)
But in any case you will want to run a refinement of some sort first, as 3D classification uses the input alignments, even if you downsample the particles prior to classification (which may be advantageous for reasons of speed)
As often, the answer is “it depends”. It depends primarily on which type of heterogeneity you are facing. If it is purely discrete heterogeneity (compositional heterogeneity is always discrete, conformational can be discrete), and at a scale that allows discriminating classes, then heterogeneous refinement and/or 3D classification should in principle let you completely resolve it. But when faced with continuous conformational heterogeneity, classification approaches won’t work because they would need an infinite number of classes to model the data correctly. This of course breaks down because in such cases, trying to classify exhaustively will only lead to more and more classes, less and less populated every round. Map quality will improve with the first few rounds because the most different conformations start separating into different classes, but eventually map quality will degrade as the number of particles per class decreases below a usable number and there is no longer enough accumulated signal to get a good reconstruction.
Continuous heterogeneity is a difficult problem, and in practice you often encounter all kinds of heterogeneity (discrete and continuous), so you need to address all of them either one by one (for instance, separating different species by heterogeneous refinement and/or 3D classification, then resolving conformations of each single species using 3DVA or FlexRefine) or all at once (cryoDRGN is good at doing this!).
I worked on a case like this a couple years ago, for which exhaustive classification was leading nowhere. What eventually worked was 3DVA and cryoDRGN. It’s here if you’re interested in reading about it: https://doi.org/10.7554/eLife.71420
The Per-particle Class ESS Histogram displayed in 3D Classification (≥v4.0) can be a good indicator if you have particles with high probability of being in more than 1 class.
Is there a particular refinement job that works best? I can 3D classify post Ab Initio by using the 3D alignments and drag dropping but would it be better to further refine via NU or HET-refine? Would further refinement improve the quality of the 3D alignments used by the 3D classification job?
In the same vein would making these refinement jobs multi-class based off an ratio knowledge of a basic make-up of heterogeneity present in the data set help the alignments?
Hi, and welcome to the forum! I would definitely do some refinements after Ab Initio, because the ab initio map can be really poor. Moreover, ab initio reconstruction is usually one of the earliest steps in the 3D processing pipeline - have you cleaned up junk particles yet? I’d recommend Oli’s decoy classification method using Heterogeneous Refinement. You can find some resources about it here:
Once you’ve sorted out the junk particles, the next step would be to get a higher resolution consensus, typically using jobs like Homogeneous Refinement, Non-Uniform Refinement, or Local Refinement. From there, you can use 3D Classification. Because 3D classification doesn’t do any alignment, it’s usually best to get the best alignments possible so that the different 3D classes represent distinct compositional/conformational states - otherwise, the classification might be dominated by subtle differences between improperly aligned particles.
To answer your last question, if you have prior knowledge of what the distribution of your heterogeneity looks like, you can absolutely use that to inform how many classes to use. For example, if your protein is known to exist in two different states and each state represents 1/2 of the particles, then two classes is a good place to start. However, if one of the two states only represented 1/10th of the data, then I might use 10 classes. I also usually run another 3D classification job with the same parameters, but with Force Hard Classification turned on - this often helps with finding rarer classes. Hope this helps!