Tips and Tricks for sorting compositional heterogeneity in image stacks

I am trying to spearate subparticles of virus capsid with and w/o Fab attached - a classic case of compositional heterogeneity. Ideally, after running 3D classifcation one should have two classes, 1 with Fabs bound and another without. I am relatively inexperienced with some CryoSparc’s 3D classification features and it is not clear how many of the different options in the job builder affect the outcome. Principally, I have been playing with the following params -
class3D_online_em_lr_init
class3D_class_anneal_beta
class3D_init_mode (simple, PCA, init vols)
Additionally, I have tried with particles stacks with capsid density subtracted from the projections (which is the mathematically right way of dealing with this situation). But, I am still struggling to get clean separation.
Are there any parameters I should be taking into account to force the classes apart? (I saw a parameter called “force hard separation”, but I am not sure if that is the right option for the job.)
Ideally, one would touch the different parameters to see what works, but given all the options, I could be stuck with twiddling knobs until the end of days. So this is why I am turning to the collective wisdom accrued within the CS community.
:slight_smile:

Many thanks in advance for your suggestions.

Best,
Pranav

1 Like

Hi Pranav,

  • Don’t worry about the initialization mode to start with - simple is fine. Sometimes reducing the number of particles to use in each initial reconstruction (from 1000 to say 100) can help to get more diversity in the initial classes.

  • Use two classes, unless the percentage of Fab-bound is expected to be very small - then you may benefit from using many more classes (say 20).

  • Use as low a target resolution as you can get away with while still separating heterogeneity. It will be faster, and for something like this you should have plenty of signal in the “molecular shape” res range. I would suggest 12 or 15 Å to start.

  • Try with and without force hard classification on (I suspect in this case you will want it on, but I would usually test both). If you have it switched off, try auto-tune class similarity as a starting point for the class similarity (when force hard classification is on, this parameter has no effect). I wouldn’t worry about mucking around with altering annealing parameters for the class similarity (class3D_class_anneal_beta), at least initially.

  • Keeping a fixed, high O-EM learning rate sometimes works well for cases like this (set the O-EM learning rate init to 1, and the learning rate half-life to 0). If everything collapses into one class, decrease the learning rate.

  • Occasionally, we have found running classification for many more epochs than the default helpful (e.g. 20 rather than 2). Often in these situations, keeping the learning rate fixed allows the classification to converge more quickly.

  • Once you’ve identified a Fab-bound and a Fab-free class, you might try another two-class run using just these two classes, setting the initialization mode to input. Alternatively, if you can’t find a Fab-free class, you might try using a map where you have subtracted out the Fab density as one of the inputs, to soak up the Fab-free particles and increase occupancy.

  • If you are dealing with a large, symmetry expanded particle set (I am guessing, given the fact that this is a virus with Fab bound), you might consider downsampling prior to Class3D to a box size such that the target res is near Nyquist. It will make things faster, and you can always use the expanded inputs to revert back to the unbinned particles if needed later.

Hope that helps!

Cheers
Oli

3 Likes

Thanks Oli! Willtry it out!

1 Like

related suggestion: heterogeneous refinement using fab-bound and apo structures as reference. or multi-class ab initio iteratively (take each result and run multi-class ab initio again).

In a general (asymmetric) case absolutely, for a virus w/ high symmetry you will probably benefit more from symmetry expansion followed by classification without alignments (with a mask around an asymmetric subsection).

1 Like

Indeed Oli, once you have determined the icosahedral parameters… one should use them until such time they are unfeasble for the question you are trying to anwer. In this specific case the data go to very high resolution, the relative orientations of the subparticles are very accurate for the kind of the question I am trying to answer. :slight_smile:

1 Like