Very happy that we now have symmetry expansion in cryosparc! Would it be possible to add an option to reverse symmetry expansion for a particle set (basically a remove duplicate particles option)?
This would be useful when one has identified a subclass of a symmetric system using 3D-VA, and then wants to perform subsequent refinement with symmetry enforced. Currently the only way to do this to my knowledge is to convert to star and remove duplicates in relion, or by manually editing the star file using awk, uniq, etc.
(Removing duplicate particles, or particles with centers closer than a certain distance, would be useful anyway - e.g. for removing duplicates after recentering)
Actually right now the “easiest” way to do this would be to use cryoSPARC .cs files directly. There is some information about how to modify .cs files (which are actually just python/numpy binary tabular data) in our data management tutorial here: https://cryosparc.com/docs/tutorials/data-management/
Essentially you can open the .cs file in an interactive python shell, select a subset of the rows (particles) that are the original particles (the field sym_expand/idx would be 0) and save those as a new .cs file. Then you can import that .cs file using the Import Result Group job type, and you will have only the first symmetric copy of the particles, with all correspondences (local motion/CTF/etc/etc) remaining intact within cryoSPARC.
Hi @apunjani - thank you for this - would it be possible to describe in a little more detail how to do this? (manipulating, reading and writing .cs files in a python shell or script)?
For now uniq does the trick - converting to a star file using pyem, and then using e.g. uniq -f11 cluster1.star > dedup_cluster1.star, assuming field 12 is the particle name field (the -f flag ignores the first n fields).
import numpy as np
cs = np.load("file.cs")
print(cs.dtype.names)
I recommend doing this in a Jupyter notebook (or ipython session), then you won’t need the print statement and can play around more interactively. The code in pyem/metadata.py for converting .cs files might also be helpful as a reference in using the numpy recarray type.
You can probably also use the alignment parameters from the “best” subunit based on the score/alignment probability, or put all the subunits back in the same sector of SO(3) using the trick in star2bild.py --sym.