@rcastellsg we talked over this internally! With such a large dataset, the job is most likely running out of memory during the final output stages (the largest dataset we’ve used for one classification job, or any job for that matter, is around 2 million particles). A couple suggestions:
- Perhaps you can use the
Particle sets tool
job to select a random subset of 1-2 million particles, run the 3D class job on that subset and see if you can identify salient heterogeneity. - If you see heterogeneity in that random subset, then perhaps you can chunk your dataset into (contiguous) subsets of 1-2 million particles each, classify each and then combine particle sets by manual inspection.
For the 3D class job itself, you can also try:
- reducing
O-EM epochs
to 2-3 (you likely won’t need as many runs through the dataset with so many particles) - turning on
Output data after every full iter
so that you can inspect outputs after every full batch EM in case something goes wrong with the final job output - adjusting the learning rate and full EM iterations so that the average class ESS is close to 1 at the end of the run – you can try using the settings in this post.
Valentin