Limiting the number of particles per 2D class

open

(Petya Krasteva) #1

Hi all,

Is there a way to limit the number of particles per 2D class during 2D classification?

We have a protein that has a big and stable part and another big part that moves significantly relevant to the first one and in a subset of the particles falls off. We have cleaned the dataset from junk particles and are trying to disperse the particles among 2D classes but the program over-aligns everything on the first part until the floppy part is averaged out and appears as cleaved off fuzz in 3D that cannot be refined locally or brought back by 3D classification.

During 2D reclassifications we end up with more or less the same number of ‘good’ classes, where the rest remain as classes with zero or few particles even if we double or triple the total number of desired 2D classes (so no gain by increasing the number of classes). We tried playing with the initial uncertainty, masks, align resolution, etc. but cannot really get the particles to not converge. If we select a single class we can break it up in subclasses somewhat more efficiently and we see high resolution features in the floppy domain (so it is not like it is not there) but we would really like to have such dispersal in dataset scale.

Bottomline, is there a way to limit the maximum number of particles per 2D class so that we can force the program to keep things separate and try to align over the entirety of the particles? If not, any other suggestions will be greatly appreciated.

Best,

Petya


(Ali Punjani) #2

Hi @PVK

Thanks for posting.
This is definitely an interesting issue.
I think you’ve already tried some of the parameters that would make sense, as well as sub-classification (which is probably the most straightforward way).
Have you also already tried multiclass ab-initio reconstruction of the entire dataset?

With 2D classification, I would suggest trying something like the following settings:
Initial classification uncertainty factor: 3
Number of online-EM iterations: 40
Batchsize per class: 200
Number of iterations to anneal sigma: 35

This will cause the 2D classification job to be quite a bit slower but will more gradually anneal the “certainty” factor so that it takes more iterations before the classes are “locked in”, hopefully giving more time for the separation of very similar classes. You should do this with 200+ classes to ensure there is enough model capacity to find the views you are looking to separate.

Please let us know if that works!


(Petya Krasteva) #3

Hi Ali,

Thanks a lot for the suggestion. Just for feedback, after running with the proposed parameters the problem persists and it seems that actually increasing the number of iterations hurts rather than helps : classes that have high-resolution features in the floppy domain appear in the intermaediate iterations and get completely smeared out by the end of the classification. Again in the end there are about 120 ‘empty’ classes (completely empty or with 1-2 particles) out of 200 and the remaining classes are blurred in the flexible region.

Best,

Petya


(Ali Punjani) #4

Hi @PVK,

Just following up with this - if you can get to 3D, could you try the new 3D Variability algorithm?
Some notes about it here: https://cryosparc.com/docs/tutorials/3d-variability-analysis/
It can generally resolve continuous conformational flexibility quite well and may shed some light into your dataset.