I have encountered several errors with the Remove duplicates job flagging independent particles as duplicates (distance >> minimum distance).
The issue might relate to using particle stacks from versions <4.1 and might have impact in 2d class as well.
This issue might be related to issue 1 and issue 2
Briefly, I was reprocessing old datasets (obtained with version 3+) and I noticed that
In Remove Duplicates job, cryosparc would wrongly tag particles as duplicates when using as input:
a particle stack from an old refinement (pre version 4)
micrographs from old CTF (pre version 4)
10A distance
Result: ~89000 particles rejected, ~19000 kept
Inspecting the picks clearly shows that particles are not duplicates
However, when the same inputs as above were used, but the pixel size 0.836 was manually provided,
Result: ~108000 particles kept, ~200 particles rejected
Overall, it seems that there is some issue with micrograph pixel size being incorrectly propagated.
In parallel, some of my users noticed issues with pixel sizes in 2D classification, with particles from old picks being suddenly marked as duplicates. The hint for the reason might be in the following warning message (2D classification job):
> Dropping duplicate particles with worse pick_stats/ncc_score value
> Could not find micrograph pixel size in particle's locations. Will assume particle image pixel size is equal to micrograph pixel size! Note that if particles were downsampled, this will be false, and remove duplicates will use an incorrect distance scale.
I can send the logs of the remove duplicate jobs if required (json metadata, event log pdf)
But after i extract the particles with 512 crop to 256, the remove duplicate as well as 2D classifcation would regard most of my particles as “duplicate”.
Thanks for the report. When remove duplicate particles was introduced to 2D Classification (in v4.1), the pixel size of the particles was mistakenly assumed to be equal to that of the micrographs. Thus, for 2D classification jobs run between v4.1 and up until the release of v4.4.1, this pixel size would be used to compute the physical distance between particles.
If particles had been extracted without fourier cropping, this would be correct. If they had been downsampled at any point, this would be incorrect, and would result in a reduced number of duplicates found (because the spacing between particles would be calculated as larger than it actually is).
For particle datasets picked in or after v4.4.1, we have updated the metadata propagation to resolve this issue, and the micrograph pixel size is now stored with the particles upon extraction. Re-picking in v4.4.1 or later should resolve this issue – if you continue to notice any discrepancies or odd behaviour for particle datasets picked in v4.4.1 or later, (such as an unexpectedly large number of particles being removed), we would appreciate a report!
For particle datasets picked prior to v4.4.1, if re-picking is not an option, the recommended approach is to disable “remove duplicate particles” within 2D classification, and to run a standalone Remove Duplicate Particles job with the micrograph pixel size manually specified.
Apologies for the confusion; please let us know if you continue to observe unexpected behaviour.
Best,
Michael
Edit: As of CryoSPARC v4.5, 2D classification now includes the option to override the micrograph pixel size in case it is missing from the input particles. If the pixel size at picking time is known, this can be input to workaround the job using the wrong pixel size.