Remove Duplicates feature

Would be great to remove duplicates, keeping all particles from a specified particle set rather than the “best” particle according to scoring metrics. Rare views and small angle views have a worse score, but are the more important to keep. This would enable us to run 1 “master” pick job, and then challenge it with many different iterations of template picking, iteratively replacing predominant views with those which are more infrequently observed (and accidentally missed) in template picking (assuming we select the “new” rare view as the one to keep). Or allow users to keep an unchanged dataset, but keep adding rare views by selecting the “master” as the one to keep and hoping a few new non-duplicates show up form rounds of picking.

Thanks all! Remove duplicates is an awesome tool.

1 Like

Hey @CryoEM2, thank you for the suggestion! This is a feature we can definitely implement, but I’m curious about the applications you suggested. Since Remove Duplicates performs using particle locations, and rare views would be at different locations than predominant views, how would a different method of selecting between duplicates enable a gradual “replacement” of predominant views? The selection metric would only affect which predominant view particle you get between the master pick job and the other input job, and the rare views should always be added - I think the job in its current state should already be capable of the second application you suggested.

I think it’s worth looking into regardless, since comparing error/correlation values from different classification jobs still feels slightly misleading and it’d be nice to give users more control over their particle dataset.

In principal I agree, removing duplicates should never be a choice between two particles just a filter to remove the same particle picked two times. But in practice, for reasons I don’t fully understand, in densely packed micrographs, a promising strategy to find rare views is to pick many times and combine (that’s just imperfect picking - to be expected), and I worry that since particles always appeared correctly picked by eye, but give remarkably different 2D class outcomes with different picking strategies, that some recentering to neighbor particle is happening (can be controlled I’m sure) or I don’t know what else. I could run the experiment and see that the duplicates from many rounds of this are always dominant views so the request is null. But given the inherent sloppiness/error of all job types in all processing programs I think I expect that I will accidentally be removing at least some desired particles I just put in. This would just be another tool in the toolbox that sometimes can be useful but without obvious explanation. I agree with your last statement also, control is nice even if theoretically not required.

1 Like

Thanks for the additional clarification, that makes a lot of sense. Anyways, we’ve recorded the request and will look into it soon.