3D Classification strategy: Classify until exhaustive?

cbeck · December 9, 2024, 5:24pm

For 2D classification, the general rule of thumb I’ve been taught was to have at least 1000 particles (on average) per class, but this starts to break down with larger datasets. I routinely use 200 2D classes for 2 million particles, which gives me good results. I don’t think many people use much more than 200 classes, because the job starts to run extremely slowly.

For 3D classification, I usually shoot for 40-50k particles per class. However, some of my colleagues show work with large, rigid proteins have used as little as 10-20k particles per class.

However, for both 2D and 3D classification, it’s important to note that the “batch size per class” parameter is a distinct parameter that controls how many particles to use for the initial iterations, which can be really important to tune for difficult datasets. The fewer particles you use for the initial iterations, the faster the job will run. However, if you don’t use enough particles, the classification might be unstable, so I’ve never gone below the default batch sizes. For 2D classification, people on this forum have recommended batch sizes from 200-1000. I’ve personally had a lot of success with a batch size of 400 for a dataset that had preferred orientation. For 3D classification, a user in this thread one used a batch size of 10000.

As for your other question, yes, the ESS will always be 1 when hard classification is turned on. Hard classification has worked wonders for me in the past, but unfortunately, I can’t really explain why it works so well. @Mark-A-Nakasone, do you have a better understanding of why hard classification can give better results?

Best,
cbeck