Inspect particle picks rejecting high contrast particles [Bug]

Hi,

Inspect particle picks seems to be rejecting the fraction of particles with high power score and high CC. It does this automatically, before even plotting the picks.

When using Topaz, this means that it looks like Topaz is missing the very best particles, when this is not the case. You can verify this by comparing the same micrograph in Inspect Picks and Manual Picker (see below).

This is very frustrating behavior - I spent ages trying to train a Topaz model that would catch these very obvious particles, when it turns out that Topaz was doing fine, cryoSPARC was just refusing to show me the picks.

Would it be possible to allow users to turn off this autothresholding? I would like to see all the picks, including those with very high contrast. I suspect this is the same issue I have previously encountered when dealing with negative stain data.

I believe Live does the same thing, and it causes some issues there too. It would be great to have an option to just see the raw particle picks - I’m not sure what added value this autothresholding is doing given the user usually applies thresholds anyway?

Cheers
Oli

Inspect Picks:

Manual Picker:

1 Like

Very well spotted @olibclarke!

This may explain a lot of what I have experienced as well… I hope the @team will be able to tackle this situation soon.

Wow - great find @olibclarke.

Does this same behavior appear when using the deep picker?

I haven’t tested with Deep Picker but I suspect so. I think it is a bug in Inspect Particle Picks, not the picker.

So a (janky) workaround would be to just extract straight from picking and hope for the best? Maybe using manual picker as a sanity check?

Right - for topaz I just run topaz extract with different thresholds and extraction radii on a small set of mics and compare in manual picker, then use topaz extract with the optimal parameters on the whole set

1 Like

It would be great if this were fixed - I’ve been teaching people to use Inspect Picks for Topaz in CryoSparc for over a year.

2 Likes

Hi all,

@olibclarke thanks for reporting this and figuring it out. Indeed in inspect picks we do have a step where the range of the power scores is computed to set the slider limits, and it’s set to exclude “outliers” - in cases where there are artefacts or strong contaminants in the dataset, a few very bright/dark spots across all the micrographs can cause very large values in the power score, and the histogram display and slider would become unusable if we were to set the range to simply the min and max values.
Clearly our outlier step is excluding some good particles in cases where there really are no outliers (like in the case of the Topaz wrapper, where the power score is not actually the cryoSPARC local power score but that column is filled in with the Topaz pick probability value).

We have changed this behaviour so that the slider range is now 2x the range from the 1st to 99th percentile of the values. This should ensure we keep all particles while still excluding (very) extreme outliers. The change is for both inspect picks and Live.
This will be out in the next patch, planned for release tomorrow.

Thanks,
Ali

1 Like

Hi @apunjani - can we please have an option to turn this behavior off entirely? If there are outliers, then the user can exclude them anyway using the sliders, no?

In the case of Topaz, this will still exclude the best 1% of particles because as you say this column is filled with the topaz pick probability value…

This behavior also causes problems in Live, where I believe the same code is reused - if one starts a template picking job, but only a limited number of clean micrographs with no outliers have been seen, then the best particles are excluded. If one then re-runs it after it has seen more data, then these obvious particles are picked correctly. This behavior is very confusing for users who are trying to use these picks to guide selection of appropriate picking parameters.

It would be great to have an option to completely turn off outlier rejection and allow the user to do this manually.

alternatively, maybe have a smarter outlier rejection - rather than just basing it on raw percentage, base it on how many SD it is from the mean, or have another criteria for the absolute difference in the local power score?

So that if the outlier is in the “top 1%” but isn’t dramatically different in terms of absolute value, it isn’t incorrectly rejected…

1 Like

Never mind - I think I misinterpreted your reply - 2x the 1%-99% range I guess does cut on absolute value - sorry misunderstood!

So if the value ranges from 1-1000 (for the 98% of particles in the middle) - it will only now reject particles with values 2000+? That should do the trick!

Hi @olibclarke, I believe the approach I’m suggesting is what you’re describing here - “2x the range from the 1st to 99th percentile” would mean that an outlier has to be twice as far from the midpoint than the 1st/99th percentile particles. So for example with topaz results, if the range of the data was between 0 and 1 (probability scores), the range of the slider would be [ -0.5, 1.5 ] covering the full set of particles and none would be rejected. A particle would have to have a value twice as large as the 99th percentile particle from the midpoint to be excluded. For gaussian distributed data this would be equivalent to ~6 standard deviations from the mean. These would be “dramatically” different particles, not just the top 1%.

The same change is being made in Live for the patch. (though note that in Live since you can’t use topaz picks, the current behaviour is less problematic, since you are likely to reject particles with power scores much lower than the current outlier rejection threshold anyway).

The reason we can’t turn this off entirely in general is because when there is an outlier, if we used just the min/max value, the range of the histogram would be blown up and all the actual data would fall on just one line in the plot. Similarly the slider range would be blown up, and the slider has discrete steps across the range, so the slider would go from accepting everything to nothing in a single step of sliding. We built the outlier rejection step because this was a recurring phenomenon in early versions of inspect picks (long ago).

Hope that clarifies the solution!

2 Likes

heh @olibclarke looks like we replied at the same time - correct!
In your example, particles with values -500 - 1500

1 Like

Yes gotcha that makes sense and seems like a very reasonable solution, thanks for the clear explanation!

Agreed it is less of a problem in Live but still does cause issues at the start of a session when there may be virtually no “true” outliers, so I think this will be helpful in both instances.

Thanks for fixing this so quickly!

Cheers
Oli

Hi @apunjani - thanks for the quick fix! Tested the new patch and can confirm it is working correctly now, Manual Picker and Inspect Picks are now giving consistent results on Topaz output. Thanks!

Cheers
Oli

1 Like

Hi @team,

I’m not sure if something changed in v4, but the old (bad) behavior reported here seems to be back.

Maybe the old rejection/thresholding code got ported during the rework rather than the fixed version? I am now seeing particles “missed” on Inspect picks after Topaz which are clearly picked if I look at the same mics in Manual Picker (compare attached screenshots).

Cheers
Oli


Hi @olibclarke thanks for reporting - you are correct. We are fixing this now in v4 and it’ll be out in the next minor release early next week.

2 Likes

Great, thanks @apunjani!