Hi @cbeck! These are some great questions about symmetry! I’ll try to help answer these, and others in the community might have more to add too.

To clear up writing, I’ll refer to your enzyme as having subunits “A” and “B” (the two monomers which make up your C2 dimer) and your alignment as having positions “left” and “right” (referring to whether that subunit is aligned to one symmetry-related position or the other; left and right have no biological/physical meaning). A and B are identical, but it can be useful to be able to refer to the identity of a subunit independently from its aligned pose.

On to your questions:

## Is the number of classes for 3D classification related to the degree of the symmetry expansion?

In general, I agree with your reasoning here, but have a few clarifying comments.

### Does classification undo symmetry expansion?

First, if you perform a Cn symmetry expansion and then classify with N classes, it is not *quite* the same as undoing the symmetry expansion. Consider your case of a C2 particle. Your particles are likely in some kind of mix of A/B and B/A in left/right positions.

If you perform symmetry expansion, you will have equal numbers of A/B and B/A particles (since each image is copied and rotated to the other positions). If you then classify into two classes without alignment, a *perfect* classification would result in two classes: one which is 100% A/B and the other which is 100% B/A. Picking one of these two classes would result in a stack which has each particle only once, in the same orientation (A/B or B/A) as the other particles in that stack.

So in some senses, we have undone the symmetry expansion — the particles are again only in the particle stack only once. However, some of the particles’ poses have flipped so that all poses are consistent. Again, this assumes that classification is perfect!

### Should the number of classes always be greater than N?

This depends on what you want to do! In the case I outlined above, essentially using 3D Classification to put all particles in the same orientation, you only want there to be N choices for each particle (it’s either in the A/B class or the B/A class).

If you’re classifying on something more complex you will need more classes because of the symmetry, but this is true even without symmetry expansion.

Consider again your C2 case. Say your subunit A has a 50% chance of being a different conformation (let’s call it A’) than your subunit B. Without symmetry expansion:

- some unknown fraction of your particles are oriented A/B, and the rest are oriented B/A
- 50% of A is in a different conformation

therefore, 3 classes are needed to capture the variation in your particle stack:

- A/B (since A and B are identical, this class also captures B/A)
- A’/B
- B/A’

If you perform C2 symmetry expansion before classifying, you know that all of your particles are in both A/B and B/A orientations (put another way, you know the exact fraction of particles from the first bullet point: 50%). Again, you know that 50% of A are in conformation A’. You can see that the same three classes are needed — you would just expect that more particles are in each of the classes.

For this reason, we often recommend a slightly modified workflow from what you’re suggesting:

- Perform symmetry expansion
- Mask out a single subunit
- Classify just one subunit
*position*. Because of symmetry expansion, you are actually classifying all subunits.

Using this technique, you can use only two classes:

- A (since A and B are identical, this class also captures B)
- A’

This is only a moderate savings for the C2 case, but consider a C4 case with A/B/C/D, and again with 50% of A being A’. You would need 5 classes in the un-expanded case:

- A/B/C/D (which captures all four orientations that do not have A’)
- A’/B/C/D
- B/C/D/A’
- C/D/A’/B
- D/A’/B/C

and still only two in the expanded case with a single subunit masked out:

- A (captures B, C, and D as well)
- A’

I know that’s a lot of alphabet soup, but I hope I’ve made it clear that in some cases, symmetry expansion can actually *reduce* the number of classes you need. This is in fact one of its more useful applications. You get to classify on specific features of the protein without having to worry about which of the N equivalent positions that particular subunit got aligned to in the early steps of processing.

After performing these analyses, you can come back through and count the number of subunits in each class, per particle, if you wish. An example of this workflow is here, if you’re interested: cryosparc-examples/symm_expand_filter.ipynb at main · cryoem-uoft/cryosparc-examples · GitHub

## Is refining just one of the two classes the same as enforcing C2 relaxation?

In essence, yes, refining a class of particles which are all in the same orientation will be the same as relaxation. You’d want to do a Local Refinement here, so that no particles flip back around to the wrong orientation. Also, since classifications are not always perfect, you should probably run a Remove Duplicates job to make sure that there are no particles which ended up entered into the same class twice.

## What other workflows benefit from symmetry expansion?

You may already know this, but you should never perform a *global* refinement (such as Homogeneous or Non-Uniform Refinements) with symmetry expanded particles, since this may result in duplicate particles and corrupt your GSFSC.

If you perform a local refinement of symmetry expanded particles, it is similar to but not the same as performing a global refinement enforcing C2 symmetry. This is because when you enforce C2 symmetry, you *force* the map to be symmetric. When you perform a refinement, typically one of the two subunits will align better than the other. This is especially true if the subunits are flexible. In this case, half of your map would be of higher quality than the other, since the particles are all aligning there. This results in a map with a higher-quality A and a lower-quality B.

A common symmetry expansion workflow is to perform the masked classification I describe above to get a population of images for each possible conformation/composition of the monomer, then perform local refinements of those particles using the same single-subunit mask. This gives the highest-quality map you can get for each of (say) A and A’.

That’s a lot of information, but I hope it is helpful! I’d be happy to discuss any more questions you have!