How to distinguish true compositional heterogeneity from continuous flexibility during 3D classification?

Hi everyone,

I am working on a large single-particle cryo-EM dataset of a hexameric AAA+ complex with dynamic adaptor occupancy at the N-terminal domain (NTD). The main challenge appears to be strong compositional heterogeneity combined with intrinsic NTD flexibility.

Current workflow:

- Started with ~1 million particles

- After junk removal and hetero refinement: ~800k particles

- Initial 3DVA suggests significant NTD motion even without strong adaptor density

- Proceeded with 3D classification (hard classification) to separate adaptor-associated states

- Used 10 classes with filter resolution set to 10 Å

- Obtained ~6 distinct volumes/classes with varying adaptor-associated densities

I do not expect major conformational changes in the core hexamer, so I am trying to understand what exactly the classification is separating in this case.

My questions are:

1. If 10 classes produce ~6 meaningful volumes, does this likely represent true compositional heterogeneity or could it reflect continuous NTD motion/flexibility?

2. In systems where adaptor occupancy is weak and dynamic, how reliable is hard classification for separating biologically relevant states?

3. Would focused classification/local refinement around the NTD region be more effective than global classification?

4. Are there alternative pipelines people recommend for resolving subtle compositional heterogeneity in highly dynamic datasets (multi-body approaches, particle subtraction, cryoDRGN, iterative heterogeneous refinement, etc.)?

Would really appreciate suggestions or experiences from others working on resolving structural ensembles and compositional heterogeneity in highly dynamic cryo-EM datasets.

Hi,

good questions indeed, and the answer depends on what you expect or want to say out of your data. 3DVA or classification could give more or less the same answer if there is continuous heterogeneity in my eyes, but if the movement is large on your NTD you might have some artifacts of 3DVA towards the end of the movements and in such case maybe classification would be more useful (?). Or you could combine both to show 1/ the extend of the different poses by classification, and 2/ a proposed conformational ensemble using 3DVA, where the different components could propose different routes of space exploration. You can use phenix.varref to refine an ensemble of conformation in your maps and measure the deformations (rotation/translation) in ChimeraX.

By combining both it is quite convincing I think.

Vincent

I’d strongly suggest masked 3DVA, always works better for me when I know what I am looking for. But trying to make any algorithm figure out two kinds of heterogeneity at the same time is challenging, so you can start with clustering out the particles corresponding to full complex (hoping that masking the ligand region will help with this), then work separately on the two sets.

Hello,

With a hexamer, you should definitely investigate how the symmetry affects what you are seeing.

I suggest to do at least symmetry expansion and masked 3D classification, starting from your best refinement of the hexamer (I assume with symmetry enforced?). You can try to work out the possible number of classes by thinking about all possible combinations of binding factor present and different conformations. But this is rarely straightforward, so if in doubt always request many more classes than you think you need, and maybe run replicate jobs with different numbers of classes (this was a key part in the case study on ALC1, in which 3D classification was run with 40 and 80 classes, despite the particle having only C2-symmetry under the special case of having two copies of the binding factor bound, C1 otherwise). Too many classes means you have to regroup similar-looking classes, not a problem; too few classes means you will force different objects into the same class, which will defeat the point of classifying.

One important step is to figure out which combination of subunits is the true asymmetric unit. It is not necessarily a single protomer, if multiple protomers have concerted conformational changes. If in doubt you can always define a single protomer as the asymmetric unit, but if it is small in MW you might not have enough signal to classify it (and it will also require more storage if you later do signal subtraction, necessary for local refinement).

Here are some useful resources about symmetry and pseudo-symmetry, in case you haven’t read them yet:

Good luck!

2 Likes