Remove duplicate particles [feature request]

Hi,

Would it be possible to add an option to remove duplicate particles during particle extraction? I would suggest adding a duplicate removal threshold, where if two particles are closer than x Å after recentering, only one is kept. This would be useful when re-extracting particles in a smaller box after initial classifications have been performed in a larger box, where there is some risk that two initially mis-centered picks may end up converging on the same particle after recentering.

Cheers
Oli

4 Likes

Hi Oli,
Has this new function been available yet?
Thanks,
Wei

Hi Wei,

No, not yet.

Oli

I do it by converting to star file with csparc2star.py, then use star.py --min-separation.

2 Likes

Perhaps it might be possible to add this as an option to the particle sets tool? It would be helpful to facilitate combination of particle sets obtained with different picking strategies within cryoSPARC. One could also add an option to keep only duplicates - that is, to only keep a particle if there is a pick within x Å in both A and B sets. This would allow easy creation of a set of “consensus” picks when testing different methods.

Another option might be to add a “minimum separation” parameter at particle extraction?

by the way @DanielAsarnow when I try to run star.py using the latest commit of pyem, it hangs with the following message:

from: can't read /var/mail/__future__

csparc2star.py still seems to work fine…

Oli

@olibclarke For some reason it’s not being executed by python. You’re seeing an error from the from command funnily enough. Let me know if it’s my fault (wrong #! line?). Also if you haven’t used it in a long time, the star.py CLI functionality is in pyem/star.py while the library functions (which used to have the CLI program as well) are in pyem/pyem/star.py.

1 Like

Thanks that was the issue! I had added pyem/pyem to my PATH because that was where star.py used to be. Works now, thx!

Unfortunately @DanielAsarnow it doesn’t seem to work though, at least if processing is done entirely within cryosparc - star.py complains that the rlnMicrographName column is missing - this is from two star files generated from the refined particle.cs files

Never mind! I didn’t realize that csparc2star.py no longer warns if rlnMicrographName is missing in the input cs - would maybe be better if it warned the user but still completed, so that the user knows if a passthrough file or additional star file is needed

this doesn’t work as expected though - I put in two files with 80 and 90k particles respectively, and end up with 66k particles output. I expect to end up with more particles in the output than either file alone, because what I am expecting is that star.py will merge the two files - removing just one member of a pair if there is a duplicate. Is this not what it does?

Cheers
Oli

Yes, if you just give a bunch of star files without other arguments, it’s supposed to concatenate them vertically. There must be something up with the code path from min-separation. I’ll check.

@DanielAsarnow I tried csparc2star.py on a cs file from NU refinement and it did not show obvious errors: ~/miniconda3/envs/pyem/bin/python ~/pyem/csparc2star.py cryosparc_P5_J126_009_particles.cs cryosparc_P5_J126_009_particles.star --copy-micrograph-coordinates …/J122/join_particles.star --relion2
However, I got more than 2 times particles in the new star file compared to the original join_particles.star which was imported to CryoSPARC for NU refinement. Do you know why the particle numbers are different? Could be some duplicates? Thanks so much!

Or CryoSPARC can only import Particle Stack from Extract or Select of Relion (not JoinStar of Relion)?

@DanielAsarnow @olibclarke @nfrasser Hi All, I am having the problem about converting the right number of particles from cryosparc to Relion using csparc2star.py. Not sure if it is related to cryosparc v2.15.0 or csparc2star.py or other problem. Basically the star file converted from csparc2star.py has many more particles (some of which are duplicates I think) than the original. Please help. Thanks in advance!

Hi all,

In v3.0 released today, we have exposed a standalone Remove Duplicate Particles job located under the Utilities section, which can be used to remove duplicated particles in an input particle stack. This can be used to filter out any particles that may have been picked too closely together, or can be used in more advanced workflows such as safely combining particle picks from multiple different pickers (e.g. combining both template picks and blob picks from two different jobs, for example). A few more details about this job are provided in the guide job page.

Best,
Michael

3 Likes

fantastic! look forward to trying it out!