3D classification job not converging with large number of particles

Hello,

I am writing because I am trying to use the 3D classification job to classify a large dataset (~6 million particles) after a consensus refinement. Working with a subset of 10% of the data picked randomly using the particle sets job I was able to find parameters in 3D classification that result in good differentiation of classes. During this 3D classification with a small subset the job starts converging during the O-EM iterations and I can see how the classes become more and more distinct from each other as iterations pass (with a significant reduction of the mean ESS that goes from 5 (input number of classes) to ~2.5).

However, when I run the same exact job with identical parameters for the larger full dataset, the job doesn’t seem to converge. It is doing a lot more O-EM iterations (2420 vs 240 in the smaller dataset) and it has finished more than half the iterations but the mean ESS doesn’t seem to change and the classes seem to look quite identical from the initial ones, showing almost no significant difference with the consensus.

Any idea about what could be causing this behavior?
Some details about the job: I’m using the initialization mode “simple”, asking for 5 classes, target resolution of 4 A, initial low pass filter to 10 A (all these params worked fine for the smaller dataset).

I am also trying in parallel to run the classification with the big dataset but using the initialization mode “input” and using the volumes from the small dataset classification as input volumes. But this seems to be behaving in the same way as the other full dataset job, with the exception that the iterations are going much faster (already at O-EM 2000/2400). However, ESS still is very high and close to 5.

I would appreciate any feedback on this!

Thanks,

J.

Hi @jperez!

It’s great news that the 3D Classification of the subset of particles was able to pull out some heterogeneity. We’re surprised to hear that the heterogeneity goes away when you increase the number of classes. One thing that may help is increasing the number of classes (perhaps to 25–50) to try to capture any additional heterogeneity that exists in the 6M particle set.

Additionally, I wonder how certain you are that the 6M particles are all clean? Have you already achieved a moderate (i.e., better than 5 Å) resolution consensus reconstruction of these particles?