Tips for removing duplicate particles that have slightly different centers

Hi, I’m processing a dataset of protein ectodomain on virus-like particles and am using Topaz to pick particles. Since it is difficult to differentiate individual particles when manually picking, I end up over picking the particles outside of the vesicle (i.e., some of the particles are almost certainly picked twice, albeit with very slight differences in the particle center). My dilemma surrounds when to remove duplicates. Specifically, when two particles fall within the distance threshold to be considered duplicates, which one gets removed? I’m curious about this because I want to keep the “best-centered” particle, since I always turn off particle re-centering when doing 2D class averages. With particle re-centering, the membrane, and not the protein ectodomain, seems to dominate the re-centering. Does it make sense to do 2D classification with re-centering turned off, and then remove duplicates from the best 2D classes? How detrimental is it to have duplicate particles when performing 2D classification? Any advice is appreciated, thanks!

Hello,

Having duplicate particles at any stage is not ideal but you should be able to achieve your goal iteratively. If you do the initial 2D class with duplicates (I would create more than the standard number of classes, and be lax about choosing) and then use the ‘remove duplicate particles’ utility on your selected classes, you will have a decent set of data.

I would then do another 2D class on those particles to create unbiased classes. At this step you can do a regular number of classes and be as stringent as you want when choosing. From there you should be able to proceed as normal with refinements.

As for no re-centering, you could probably just make the box size bigger to not accidentally lose information on the periphery. Then fourier crop accordingly depending on your storage constraints. You may have to play with the circular mask diameter within the 2D class job.

There are options for this in remove duplicates. You can remove the worst scoring one in class 2d, worst NCC from picking, discard at random, etc… which is best will depend on your use case

Thank you! I’m not sure how I overlooked this option. It’s not immediately clear to me how this is handled by the Remove Duplicates setting in 2D classification. Perhaps it’s best to leave that option off during 2D classification, and instead run a separate Remove Duplicates job afterward in order to have access to the full range of options?

1 Like

Yes, I think that’s right if you want more granular control. I think dups are discarded randomly during 2d, but not 100% sure

1 Like

If the issue with the recentering during 2D persists, it’s possible that playing with the recentering options might help. I’ve had a similar issue, where stronger features with more contrast dominate the recentering, which makes sense because (as far as I understand it) the default options are to recenter based on the strength of the signal in each pixel, and only to consider pixels with at least 20% of the maximum signal in the box. My solution is to set the recentering threshold to 0.1 and then to turn on the ‘binary’ recentering option, so that it equally weights every pixel with at least 10% of the maximum signal. This for me has worked much more effectively than playing with the circular mask parameters. If I turn circular mask off completely it still centers just fine.

(target particle is about 100kDa, densely packed, usually invisible by eye in the micrographs; data from 300kEv microscope, but also would probably have worked on the data I got from a 200keV instrument)