Split output for symmetry expansion

Would it be possible to add a “split output” selector for the symmetry expansion job?

This idea is related to but distinct from reversing the expansion. For example, I have a flexible dimer, where I can use subtraction and local refinement to get nice structures of either monomer, and I would like to C2 transform one of these refinements in order to do a single, expanded local refinement with all of the subparticles. One can export the particles and use pyem subparticles.py, but I thought a “split output” feature would be pretty easy to implement and potentially useful for certain classification strategies as well.

1 Like

Hi @DanielAsarnow,

Thanks for the post. To clarify, are you referring to splitting the output by which symmetry operator was applied (e.g. so for a C2 expansion, two particle stacks should be outputted, the first corresponding to the identity operation and the second corresponding to a 180º rotation)? This makes sense, however if you wanted to do a single local refinement with all the sub-particles, wouldn’t you need to merge these two stacks immediately before the refinement? Are there other workflows that would specifically benefit from having each set of transformed particles separated?

Currently, separating particles by symmetry transformation should be possible with the data stored in the particle dataset – the sym_expand/idx field will contain the identity of each rotation applied. (An idx of 0 corresponds to the identity transformation, the others are not in any well-defined order).

Best,
Michael

Related to this - is there a straightforward way to identify how many “copies” of a given particle are present in a particular class?

Let’s say in @DanielAsarnow’s example of a dimer, there is something binding to each monomer with partial occupancy. in that case we have dimers with 0, 1, and 2 monomers in the bound configuration.

Is there an easy way to separate out these states after symmetry expansion and classification?

With remove duplicates I can separate 1 vs 2 - but what about higher order oligomers? E.g. for a hexamer, how would I identify which particles have all six protomers in the bound configuration?

Cheers
Oli

Hi @olibclarke,

Interesting question – not sure if this answers it cleanly, but just some thoughts for discussion!

In the case of a dimer, would you use something like:

  1. Align all particles to a global C2 reference
  2. Symmetry expand in C2
  3. 3D classification with mask over both of the sites where there’s partial occupancy

If it works well, you get 3 classes with 0, 1, and 2 binding partners attached (this is like the 3D classification tutorial for empiar-10425). Then you take particles from each class, use remove duplicates to get rid of symmetry-expanded copies.

In principle this could work with higher-order oligomers, but the number of classes that could exist grows exponentially. For N asymmetric units with binding sites, the binding partners could either be present in that site, or absent from that site. So there are two possible states for each ASU, and the total number of configurations is then 2^N (we have to consider each site as being distinct, since we are using symmetry-expanded particles, so fixed-pose classification can/will find different classes as rotated copies of other classes). E.g. for a dimer you have 2^2 = 4 possibilities (bound-bound, bound-unbound, unbound-bound, or unbound-unbound). For a trimer, you have 2^3 = 8, and so on. So for a hexamer, in principle you could do a 64 class 3D classification on C6-symmetry expanded and hope to see classes corresponding to at least some of those 64 configurations.

You can instead put a mask on just one monomer’s binding site, but then you reduce the chances of finding classes with >1 binding partner attached, because the classification masks out everything but one monomer – it can’t look at coordination across different monomers.

Does this make sense / is it in-line with what you’re thinking?

Best,
Michael

Hi Michael,

I would generally just mask a single monomer after symmetry expansion.

What I would like to be able to do then, is to identify, for a given particle (i.e. a given location on the micrograph), how many “particles” (original and rotated copies) are retained after classification.

This would avoid the exploding number of classes required for 3D classification in such an instance, and allow a more focused look at the particles that only have, for example, 4/6 sites in a hexamer occupied.

Does that make sense? There is a way to do this with a combination of Particle Sets operations right now, but it is rather convoluted and confusing.

Cheers
Oli

1 Like

Ahh this makes sense, thanks @olibclarke for clarifying further.

I imagine this would be possible with the associated metadata in the sym_expand and alignments_class3D_x fields. In this case, an input classification with 2 classes would be needed, one corresponding to the occupied and the other to the unoccupied state. Most likely we could consider adding an example script under the CryoSPARC Tools documentation, since there’s a lot of customizability with this sort of “advanced case” of symmetry-expansion reversal that would make a general implementation in CryoSPARC a bit complex.

Just a few more questions to best understand the utility of such a workflow…

For a symmetry group of order N, would simply identifying & outputting subsets of the symmetry-expanded stack with k/N sites occupied be sufficient? E.g. if you wanted to take a hexamer, and after symmetry expansion & classification break it into 7 disjoint subsets where each subset had either 0, 1, 2, … 6 monomers attached. What would the intended downstream use for these subsets – potentially further classification or refinement?

You can imagine that for 4 of 6 sites occupied, there’s 3 different global configurations possible depending on the relative positions of the occupied/unoccupied sites, so even after identifying that subset, you’d still need more classification at the global-mask level to tease out different states. (This isn’t necessary for the 0, 1, 5, or 6 monomer attached cases, since each of these only has one global configuration, arbitrary up to an overall rotation).

Best, Michael

1 Like

Hi Michael,

For a symmetry group of order N, would simply identifying & outputting subsets of the symmetry-expanded stack with k/N sites occupied be sufficient? E.g. if you wanted to take a hexamer, and after symmetry expansion & classification break it into 7 disjoint subsets where each subset had either 0, 1, 2, … 6 monomers attached. What would the intended downstream use for these subsets – potentially further classification or refinement?

Yep exactly. And quantification/comparison between different datasets. And yes that would be great! Yes the idea is to be able to tease out some context - if we can identify (locally, after sym expansion), which particles have N subunits occupied, and then perform the global classification without alignments just on that subset, that makes the classification significantly less complex.

You can imagine that for 4 of 6 sites occupied, there’s 3 different global configurations possible depending on the relative positions of the occupied/unoccupied sites, so even after identifying that subset, you’d still need more classification at the global-mask level to tease out different states. (This isn’t necessary for the 0, 1, 5, or 6 monomer attached cases, since each of these only has one global configuration, arbitrary up to an overall rotation).

Yes, that’s right, that’s the idea! :smile:

Cheers
Oli

1 Like