I don’t know that I see any problems with your 2D classes (although I have never seen a 2D class from an RNA structure before! Very cool!). It might be worth keeping the 400k particles and trying to weed out any noise in 3D instead.
I have a lot of heterogeneity in my own data. My favorite starting method is to do a one-class ab initio, then multiple classes of a heterogeneous refinement. Sometimes it works to do a multi-class ab initio first. I discard whatever class/es looks like a noisy class, and continue with the rest of the particles.
Depending on what the classes look like at this point and how similar they are I might combine all or a subset into another ab initio and heterogeneous refinement, or just continue treating each class separately. After the heterogeneous refinement I do a separate NU-refinement on any classes that look promising and like there are enough particles.
If the heterogenous classes are really different I try to pull out different conformations through 3DVA instead. I do an ab initio and homogenous refinement on all the good particles, and use the homogenous refinement output as the mask input for a 3DVA job. Then before running 3DVA I do a NU-refinement on the ab initio output. I combine the NU-refinement particle set with the homogenous refinement mask to feed into 3DVA. In my case it’s taken a LOT of tweaking of 3DVA parameters to find the sweet spot for my dataset.
After 3DVA I do a couple of 3DVA analysis jobs. Sometimes sorting by intermediate states gives me enough to pull out similar particles, and sometimes clustering works better. I combine a certain number of intermediates, or the clusters that seem to be most similar, and do another ab initio and NU-refinement. Again though, the workflow and parameters will be specific to the dataset you’re working with.
I had some of these pesky shell-like “flower spike” densities that I had a hard time getting rid of. They’re related to overfitting within the masked region I believe. One thing that worked for me was waiting to start dynamic masking until a higher resolution. AKA waiting until something like 7A instead of the default which I think is 12A. If you’re doing dynamic masking it might also help to play around with the Dynamic mask near and far parameters.
Here’s a good thread that discussed troubleshooting these spikes as well as some other stuff: