Issue Removing duplicate particles: "Remove Duplicates" job vs "2D Classification"

Hi, I am seeking clarification on the variances observed in particle removal outcomes between the “Remove Duplicates” job and the “2D Classification” job in cryoSPARC.

Following a “Curate Exposure” job, I obtained a particle set comprising 289,213 particles. Subsequently, when utilizing the “Remove Duplicates” job with default parameters and a “Minimum Separation Distance (Å)” of 20, all particles were accepted, resulting in 289,213 particles with none rejected.

However, when employing a “2D Classification” job with identical parameters, including “Remove Duplicates” activated and the same “Minimum Separation Distance (Å)” of 20, 116,581 particles were rejected. Disabling the “Remove Duplicates” option during the “2D Classification” job retained the entire set of 289,213 particles.

I was hoping to understand potential discrepancies between the “Remove Duplicates” job and the act of removing duplicates during “2D Classification.” Ideally, one would expect comparable outputs in terms of the number of accepted and rejected particles. Thank you in advance.

The distance calculation happens after 2D alignment, so picks which are mandatorily 0.5 box size apart (picking settings) will not be removed. Once the 2D finds the two picks are one particle, they have 0 distance. You can avoid this by using appropriate “min particle size” in picking or appropriately large distance between picks, or your current process

So the “Remove Duplicates” job focuses on spatial separation, while during “2D classification”, the alignment process considers the similarity of particle projections.

So if I have understood correctly there might be a potential conflict between the default “Minimum separation distance (A)” setting of 20 Angstroms in the “2D Classification” job and the spatial separation criteria used in the “Remove Duplicates” job. The concern arises when two picks are recognized as the same particle during 2D alignment (distance of 0), but their separation is less than 20 Angstroms. In such cases, the default settings might identify them as duplicates during 2D classification, potentially leading to unintended removal during the subsequent processing steps. However, this can be corrected by adjusting the “min particle size” during picking and ensuring a more substantial distance between particles, as you suggested?

Thank you for your prompt and insightful response.

If it’s working correctly, you do want 2D to remove them. It recenters the particles that were picked to the center of the 2D. If those classes are accurate, then it is helping bring duplicate picks into one position. Try a “remove duplicates” job type after the “2D class with no duplicate remove” job. It should now get rid of the same amount.

You can also manually visibly inspect particle picks before and after 2D / removal using a couple “inspect picks” jobs

Thank you for the help, you were right, a “remove duplicates” job type after the “2D classification" with no duplicate remove job, got rid of the same amount of particles. This was great. Thank you, again.

Hi all,

Chiming in on an old thread; we have described the current behaviour, as well as the current workaround, in a related thread here