Heterogenous dataset?

Hi All,
I am pretty new to the field and now processing my very first dataset.
It is supposed to be a ~140 kDa dimer displaying a C2 symmetry.
After all the pre-processing steps and particle extraction and 2D classification I end up with the following classes - 650k particles in total:


then I run ab-initio with 1, 3 or 6 classes followed by hetero refinement. And what I got is that 2D classes that look alike (highlighted in circles) are sorted together and effectively I end up with 4-5 distinct structures but each with strong preferred orientation.
I am not sure whether this is caused by the poor quality of the sample (cross-contaminations co-purified on the IMAC column, thus many different structures present) or by problems with data processing (in fact the various 2D classes might belong to the same protein/structure but for some reason are separated at the stage of 3D classification).
many thanks,
Peter

green and yellow look like distinct views of 1 complex. red looks monomer of that. blue looks different entirely. as does class 2 and similar. I would guess 3 or 4 structures.

Pool similar 2d, get ab initio of each form using 1 class (won’t necessarily look great, but at this step you’re not asking the program to sort just to use your particles to find a consensus) and/or combined as above (add all yellow and green classes and get 1 initial model). Use these as references in het refinement including ALL particles (not just the particles contributing to that ab initio/2Dsel). With any luck, you should start to get a couple believable structures which are quality enough to contain particles of different views. Then can remove these from the dataset (particle sets) and repeat with what’s left behind.

You should not use any symmetry parameters at this step, just C1 to sort out what shapes are there. Symmetry later to get better reconstructions.