Tips and Tricks for sorting compositional heterogeneity in image stacks

pranav · April 16, 2024, 11:00pm

I am trying to spearate subparticles of virus capsid with and w/o Fab attached - a classic case of compositional heterogeneity. Ideally, after running 3D classifcation one should have two classes, 1 with Fabs bound and another without. I am relatively inexperienced with some CryoSparc’s 3D classification features and it is not clear how many of the different options in the job builder affect the outcome. Principally, I have been playing with the following params -
class3D_online_em_lr_init
class3D_class_anneal_beta
class3D_init_mode (simple, PCA, init vols)
Additionally, I have tried with particles stacks with capsid density subtracted from the projections (which is the mathematically right way of dealing with this situation). But, I am still struggling to get clean separation.
Are there any parameters I should be taking into account to force the classes apart? (I saw a parameter called “force hard separation”, but I am not sure if that is the right option for the job.)
Ideally, one would touch the different parameters to see what works, but given all the options, I could be stuck with twiddling knobs until the end of days. So this is why I am turning to the collective wisdom accrued within the CS community.

Many thanks in advance for your suggestions.

Best,
Pranav

olibclarke · April 16, 2024, 11:40pm

Hi Pranav,

Don’t worry about the initialization mode to start with - simple is fine. Sometimes reducing the number of particles to use in each initial reconstruction (from 1000 to say 100) can help to get more diversity in the initial classes.
Use two classes, unless the percentage of Fab-bound is expected to be very small - then you may benefit from using many more classes (say 20).
Use as low a target resolution as you can get away with while still separating heterogeneity. It will be faster, and for something like this you should have plenty of signal in the “molecular shape” res range. I would suggest 12 or 15 Å to start.
Try with and without force hard classification on (I suspect in this case you will want it on, but I would usually test both). If you have it switched off, try auto-tune class similarity as a starting point for the class similarity (when force hard classification is on, this parameter has no effect). I wouldn’t worry about mucking around with altering annealing parameters for the class similarity (class3D_class_anneal_beta), at least initially.
Keeping a fixed, high O-EM learning rate sometimes works well for cases like this (set the O-EM learning rate init to 1, and the learning rate half-life to 0). If everything collapses into one class, decrease the learning rate.
Occasionally, we have found running classification for many more epochs than the default helpful (e.g. 20 rather than 2). Often in these situations, keeping the learning rate fixed allows the classification to converge more quickly.
Once you’ve identified a Fab-bound and a Fab-free class, you might try another two-class run using just these two classes, setting the initialization mode to input. Alternatively, if you can’t find a Fab-free class, you might try using a map where you have subtracted out the Fab density as one of the inputs, to soak up the Fab-free particles and increase occupancy.
If you are dealing with a large, symmetry expanded particle set (I am guessing, given the fact that this is a virus with Fab bound), you might consider downsampling prior to Class3D to a box size such that the target res is near Nyquist. It will make things faster, and you can always use the expanded inputs to revert back to the unbinned particles if needed later.

Hope that helps!

Cheers
Oli

pranav · April 16, 2024, 11:44pm

Thanks Oli! Willtry it out!

CryoEM2 · April 18, 2024, 1:18pm

related suggestion: heterogeneous refinement using fab-bound and apo structures as reference. or multi-class ab initio iteratively (take each result and run multi-class ab initio again).

olibclarke · April 18, 2024, 1:26pm

In a general (asymmetric) case absolutely, for a virus w/ high symmetry you will probably benefit more from symmetry expansion followed by classification without alignments (with a mask around an asymmetric subsection).

pranav · April 19, 2024, 9:13am

Indeed Oli, once you have determined the icosahedral parameters… one should use them until such time they are unfeasble for the question you are trying to anwer. In this specific case the data go to very high resolution, the relative orientations of the subparticles are very accurate for the kind of the question I am trying to answer.

csparc_addict · November 20, 2024, 8:14pm

Hi Oli,

Could you comment on how increasing the number of O-EM epochs and full iterations each impacts the classification, respectively? I came across another post from you, where you mentioned increasing the number of full iteration dramatically improved the results. I was wondering if tuning one has a much bigger effect than the other.

My particles are hardly separated unless “force hard classification” is turned on. Therefore, the ESS is always 1, and it’s hard for me to judge whether the results were improving along with more iterations. I’d really appreciate it if you could provide some insights.

olibclarke · November 20, 2024, 10:41pm

That other post was using a very old version of Class3D - it has changed significantly (for the better!) since then. With regards to using more O-EM epochs, what I was trying to say was that we have encountered cases where classification with default learning rate only converges after many O-EM epochs - but by increasing and fixing the learning rate, we have been able to get comparable results much faster. So if you are having difficulty separating classes even with hard classification on, you might try increasing the learning rate, or providing a little more diversity in the starting volumes.