More diverse classes with convergence in 3D classification

joonpark · December 10, 2021, 5:28am

Hi,

At the end of O-EM, the batch avg class ESS is 9.635 with 3 main classes. After 5 full iterations, the batch avg class is 3.982 with particles dispersing into more classes. Is this the expected behavior as classification nears convergence? If not, will the presence of continuous heterogeneity contribute to this behavior?

Thank you,
Joon

At the end of O-EM:

At the end of 5 full iterations:

vperetroukhin · December 10, 2021, 10:41pm

Hi @joonpark,

The final full-batch-EM iterations do tend to reduce avg class ESS significantly. However the fact that the batch CESS is ~9.6 with 10 classes might be a sign that you should increase the number of epochs through the data. For reference, in the 100-class example in the 3D Classification tutorial, I had the following:

After O-EM:
Batch avg class ESS: 11.759

After 2 full iterations:
Batch avg class ESS: 2.562

joonpark · December 10, 2021, 10:53pm

Thank you so much, @vperetroukhin!

I’m glad to know that what I observed was not weird. For this job, I actually kept ‘Number of O-EM Epochs’ at 3 and increased ‘Number of Full Iterations’ to 5 because of the significant batch class ESS drop during full iterations. I will follow your advice and keep ‘Number of O-EM Epochs’ at 5 or more.

Have a great day,
Joon

olibclarke · December 11, 2021, 1:37am

So in cases where ESS is still at ~9 after 2 full iterations, would you recommend increasing the number of full iterations, or the number of O-EM epochs? From looking at the log it seems like the ESS plateaus at ~13 during O-EM, then decreases to 12, then to 9 in the last full iterations. So I guess more full iterations through the data might be the way to go?

EDIT:
It would also be very helpful to be able to continue from a previous classification run. Having to restart the entire thing from scratch to test a different number of final iterations is not ideal

UPDATE: Changing to 10 full iterations improved matters a lot - both decreasing ESS, and dramatically improving the appearance/diversity of classes. I suspect these defaults could do with some tweaking based on experience so far.

@vperetroukhin It would also be very helpful to have an option to output volume series for every “full” iteration. It seems like more full iterations is better, but only up to a point - too many and classes start to become noisy, presumably from over-refinement. But having to run 5 different 18hr jobs with different numbers of full iterations is a waste - would be good to just run one, and then I can compare the classes over the full iterations and decide where the best point is to stop.

vperetroukhin · December 15, 2021, 6:58pm

It would also be very helpful to have an option to output volume series for every “full” iteration. It seems like more full iterations is better, but only up to a point - too many and classes start to become noisy, presumably from over-refinement. But having to run 5 different 18hr jobs with different numbers of full iterations is a waste - would be good to just run one, and then I can compare the classes over the full iterations and decide where the best point is to stop.

FYI this is now implemented in the latest patch.