Reproducibility of the percentages of the 3D classes in heterogenous refinement

Hi,

I am looking to obtain a reproducible 3D class distribution for a heterogenous dataset of a ~250kDA protein-DNA complex. The particles are distributed like in a normal distribution. Unlike my experience with 3D classification without alignment in RELION, the percentage distribution for the 3Dclasses do change in csparc. I would appreciate any suggestions to resolve this.

Hi @ashokN,

The number of particles eventually assigned to each class depends on a few factors, including the initialization (if they’re different then the GMM classification will definitely converge to different class assignments, since expectation maximization is only a local optimizer). The other thing to note is that the most analogous comparison between RELION Class3D w/o align would be CryoSPARC’s 3D class (and not hetero refine, which does do global pose search). Perhaps it is worth re-trying with 3D classification in CryoSPARC, and using fixed initializations (i.e. providing the initial volumes to the job in the volume input slot).

Best,
Michael

Thanks Michael. Yes I am aware of the 3Dclassification approach in csparc (classification without alignment) which works the same way as RELION. I can test this approach with fixed initialization reference volumes and see if it gives me reproducible class percentages. Is it also called as the supervised 3D classification? Just curious

Hi @ashokN,

All 2D/3D classification jobs in CryoSPARC are considered to be unsupervised, since we rarely (if ever) have “ground truth” labels that indicate which particles each class belongs to. Using fixed initialization classes would still be considered as unsupervised classification.

Best,
Michael

1 Like