Issue on Topaz picking in unwanted area

Jasonshengyi · July 16, 2022, 7:41am

Hi, all! I encountered a problem using topaz extract job.
Since there are carbon areas ( out of the holes) appeared in some of my micrographs, the output of topaz extract job showed lots of particle picking in carbon areas (more than hole areas). When I adjusted the threshold, particle picking in both carbon and hole areas diminished. However, I don’t want to exclude that type of micrographs manually on purpose, because of the limited number of micrographs. Plus, I have used the selected good 2D classes to train the model.
BTW, is the Topaz Denoise jobs necessary before Topaz train or extract?
Are there any helpful suggestions on some parameters or methods? Thanks a lot!
1657957279799

Sincerely,
Jason

amaker · October 12, 2022, 8:09pm

Hi! Bumping this thread because I am having the same issue! Any help is much appreciated. Thanks!

Ablakely · October 17, 2022, 7:48pm

Have you tried manually reviewing the training set of particles to ensure there are no particles on carbon? Just picking the best 2D classes will likely not exclude all such particles. In my experience manual curation of all of the input particles is sufficient to exclude particles on carbon.

olibclarke · October 17, 2022, 9:46pm

Agreed with @Ablakely - the number one most important thing with Topaz in my experience is manually verifying the quality of the training data. My usual workflow is:

First pick with blob picker or template picker, as best as you can. It won’t be perfect - that’s fine.
Classify to remove junk - be rigorous. You want to aim for only including the very best 2D/3D classes here.
Curate micrographs, with the particle input from step 2; sort micrographs by the number of “good” particles remaining, choosing ~100 (depending on particle size/sparsity) with the highest number of “good” particles while still having acceptable CTF/ice thickness. These micrographs are probably going to be the best ones to use for training Topaz (in my experience).
Take the micrographs in step 3, and either manually pick them, or manually curate the particle picks you already have, selecting only particles that you are confident are genuine protein particles. You typically want somewhere between 500-2000 particles for good results, depending (I think) on the heterogeneity of your sample.
Run Topaz train, using the curated particles and mics from step 4. The most important parameter here is the expected number of particles, which should roughly correspond to the average number of “true” particles in your training micrographs. This value can be optimized using Topaz Cross Validate.
Test the resulting model using Topaz extract, preferably on a small set of good micrographs that you did not use for training.

Hope that helps, and good luck!

Jasonshengyi · October 26, 2022, 10:56am

Thanks a lot @Ablakely , I will try again!

Jasonshengyi · October 26, 2022, 11:09am

Your detailed suggestions are greatly appreciated! @olibclarke
Recently, I found Topaz picking can lead to some false positive results or artifact to a great extent, even if I used the well-trained model to pick another non-related datasets. Have you ever met with such problems on Topaz extract and subsequent 2D classification ? Any suggestions to lower this kind of overfitting?

olibclarke · October 26, 2022, 11:25am

Hi @Jasonshengyi,

We usually train Topaz models on a per dataset basis - we don’t usually use them to pick unrelated datasets. I have summarized my suggestions re training above - have you tried this?

Cheers
Oli

Jasonshengyi · October 26, 2022, 12:26pm

@olibclarke Thanks for your kind reply! I will try carefully to solve the problem. I believe it will help me a lot use Topaz properly!

Well, I am working on another project - the structure of the mouse protein X was obtained successfully by us. I am struggling with its homologous protein, the sequence identity of these two are very close. However, the particle is not that homogeneous this time. I mean target particles are very few in per micrograph. No right 2D image appeared when I used template picking. Since these two proteins look nearly the same, I imagined to use the mouse structure as the model to pick on this homologous protein datasets, in order to pick the right particles out as much as possible. I supposed Topaz is more powerful than template picking. I have tried Topaz, but most 2D classes look like artifacts. I also tried with an unrelated datasets as a negative control to observe what will happen. The 2D class also showed artifacts.

I am a tyro using Topaz. I didn’t know whether my idea worked in this case. Is Topaz more likely to cause false positive 2D classes than template picking?

olibclarke · October 26, 2022, 12:51pm

Hi @Jasonshengyi - without seeing more data (2D classes, picks, micrographs) - it is difficult to provide further useful advice. If you are not seeing good 2D classes with template picking, and visually you cannot see clear particles, it is possible that the sample is bad - denatured or dissociated - and no picking program can fix this. In this case, you may want to modify sample preparation conditions.

Cheers
Oli

Jasonshengyi · October 26, 2022, 12:57pm

Ok, I see. Thanks a lot !