Reverse symmetry expansion? [Feature request]

olibclarke · December 12, 2019, 3:33pm

Hi,

Very happy that we now have symmetry expansion in cryosparc! Would it be possible to add an option to reverse symmetry expansion for a particle set (basically a remove duplicate particles option)?

This would be useful when one has identified a subclass of a symmetric system using 3D-VA, and then wants to perform subsequent refinement with symmetry enforced. Currently the only way to do this to my knowledge is to convert to star and remove duplicates in relion, or by manually editing the star file using awk, uniq, etc.

(Removing duplicate particles, or particles with centers closer than a certain distance, would be useful anyway - e.g. for removing duplicates after recentering)

Cheers
Oli

apunjani · December 12, 2019, 7:59pm

Hi @olibclarke this is a great idea.

Actually right now the “easiest” way to do this would be to use cryoSPARC .cs files directly. There is some information about how to modify .cs files (which are actually just python/numpy binary tabular data) in our data management tutorial here: https://cryosparc.com/docs/tutorials/data-management/
Essentially you can open the .cs file in an interactive python shell, select a subset of the rows (particles) that are the original particles (the field sym_expand/idx would be 0) and save those as a new .cs file. Then you can import that .cs file using the Import Result Group job type, and you will have only the first symmetric copy of the particles, with all correspondences (local motion/CTF/etc/etc) remaining intact within cryoSPARC.

olibclarke · December 27, 2019, 9:18pm

Hi @apunjani - thank you for this - would it be possible to describe in a little more detail how to do this? (manipulating, reading and writing .cs files in a python shell or script)?

For now uniq does the trick - converting to a star file using pyem, and then using e.g. uniq -f11 cluster1.star > dedup_cluster1.star, assuming field 12 is the particle name field (the -f flag ignores the first n fields).

Cheers
Oli

DanielAsarnow · December 28, 2019, 7:52pm

@olibclarke You can start with the following:

import numpy as np
cs = np.load("file.cs")
print(cs.dtype.names)

I recommend doing this in a Jupyter notebook (or ipython session), then you won’t need the print statement and can play around more interactively. The code in pyem/metadata.py for converting .cs files might also be helpful as a reference in using the numpy recarray type.

You can probably also use the alignment parameters from the “best” subunit based on the score/alignment probability, or put all the subunits back in the same sector of SO(3) using the trick in star2bild.py --sym.

LTP · November 26, 2021, 5:22pm

Hello,

Does anyone have a simple, easy way of doing this for those of us inexperienced in programming?

Cheers.

user123 · November 26, 2021, 5:25pm

CryoSPARC now has “remove duplicates” which is what you want to do for reverse symmetry expansion.

LTP · November 26, 2021, 5:32pm

Ah yes, sorry, missed it!

Thank you!