Particles are erroneously marked as duplicate when old datasets are used

Hi all,

Thanks for the report. When remove duplicate particles was introduced to 2D Classification (in v4.1), the pixel size of the particles was mistakenly assumed to be equal to that of the micrographs. Thus, for 2D classification jobs run between v4.1 and up until the release of v4.4.1, this pixel size would be used to compute the physical distance between particles.

If particles had been extracted without fourier cropping, this would be correct. If they had been downsampled at any point, this would be incorrect, and would result in a reduced number of duplicates found (because the spacing between particles would be calculated as larger than it actually is).

For particle datasets picked in or after v4.4.1, we have updated the metadata propagation to resolve this issue, and the micrograph pixel size is now stored with the particles upon extraction. Re-picking in v4.4.1 or later should resolve this issue – if you continue to notice any discrepancies or odd behaviour for particle datasets picked in v4.4.1 or later, (such as an unexpectedly large number of particles being removed), we would appreciate a report!

For particle datasets picked prior to v4.4.1, if re-picking is not an option, the recommended approach is to disable “remove duplicate particles” within 2D classification, and to run a standalone Remove Duplicate Particles job with the micrograph pixel size manually specified.

Apologies for the confusion; please let us know if you continue to observe unexpected behaviour.

Best,
Michael

Edit: As of CryoSPARC v4.5, 2D classification now includes the option to override the micrograph pixel size in case it is missing from the input particles. If the pixel size at picking time is known, this can be input to workaround the job using the wrong pixel size.

2 Likes