Remove duplicates considering orientation?

Hi,

Suggestion for remove duplicates - it would be useful to have an option for remove duplicates where particles are removed if and only if both the coordinates and the orientations match within certain specified limits.

This would be useful in certain circumstances after classifying symmetry-expanded particles - where I want to rotate a certain class to match another (using volume alignment tools) prior to local refinement, but in so doing some percentage of the particles now match in both position and orientation, screwing up downstream refinement.

Currently, removing duplicates in this scenario would undo symmetry expansion, removing the particles I want as well as the problem set - so some way to take orientation into account during remove duplicates would be helpful.

Cheers
Oli

Hi @olibclarke,

We have a couple questions:

  1. Would you be able to provide more context/motivation to the problem?
  2. What purpose does the rotation step serve in this workflow?
  3. How would you approach picking which particle(s) to keep and which to throw away given only their alignments?
  4. Does this by chance have to deal with teasing out symmetry mismatch in a particle?

Best,
Kye

1 Like

Hi Kye,

The key here is point 2 I think:

  1. What purpose does the rotation step serve in this workflow?

The rotation step is to match classes (after classifying sym expanded particles) which are identical except for a rotation around the symmetry axis. Some of the particles in these two matched classes will be ~identical (both in terms of the particle and orientation), and I would like to remove these, while keeping the rest. It is true that if classification were perfect, these two classes would have 100% overlap, and this would be unnecessary - but classification is stochastic, so we still derive some benefit from combining these permuted classes. In general I wonder if some better bookkeeping for sym expanded particles might be helpful - right now dealing with which particles are duplicates (as originally picked) vs duplicates (sym expanded copies) is not as straightforward as it could be.

  1. How would you approach picking which particle(s) to keep and which to throw away given only their alignments?

Not sure… but there must be a way (for particles with ~identical coordinates) of comparing the alignments…?

  1. Does this by chance have to deal with teasing out symmetry mismatch in a particle?

Yes - we encounter cases where this might be useful when we have pseudosymmetry and/or sym mismatch

Thanks!

Hi @olibclarke,

Thanks for providing additional information on your workflow. Is it possible to symmetry expand and then classify based on occupancy and sort similar to this case study on MlaB? Particles can be sorted based on their symmetry expanded identifier by use of the'sym_expand/src_uid' field in a particle dataset, and @rposert has put together a nice notebook detailing this workflow.

Cheers,
Kye

1 Like

I have used this approach in the past, yes - this is a bit of a different case though - I have already sym expanded and classified, just looking to combine the classes after rotating (in which case the src_uid identifier doesn’t really make sense any more I think?).

In practice in this particular case it seems like a non-issue (FSC seems normal, so I guess no direct duplicates) but may still be useful in other cases.

Hi @olibclarke — this is an interesting problem!

I’ve written up an example that I am fairly sure does what you want — with the caveat that I have only tested this by rotating one of three classes of C2 expanded particles and checking that some of them are caught as now having the same class and pose, so I can’t stand 100% by its accuracy. I’m sure someone better with pandas could make this faster, too :slight_smile:

What we do here is, for each sym_expand/src_uid, check each combination of expanded particles for the same pose. Optionally, you can ignore particles with the same pose if they’re in different classes (not sure you’d want to do this).

I group by src_uid rather than e.g. micrograph position because we know those particles came from the same place and it saves us an expensive comparison.

Next, if two symmetry-related “particles” (i.e., two particles with different sym_expand/idx but the same sym_expand/src_uid) have the same pose to within your tolerance in radians (atol), we record their UIDs in duplicate_uids. This means we record the UIDs of the symmetry expanded particles which have pose duplicates, not the UIDs of the source particles.

From there, you could theoretically use CryoSPARC Tools to export those particles and use the standard Remove Duplicates job, or within the check_all_combs function you could pick a UID using some other metric, etc.

Is this the type of analysis you’re thinking of?

2 Likes

This sounds like exactly the thing! I’ll test on my end and see how I go, thanks heaps!

1 Like

Great, please keep me updated on whether it seems to work or not! If it seems to be doing something bad I’ll either fix it or take it down so as not to mislead others :slight_smile:

1 Like