Topaz picking for graphene oxide grid

Indrajit · May 24, 2023, 2:25pm

Hi

We are working with a ~120 kDa elongated complex on graphene oxide grids. We have obtained reasonably good micrographs and can get decent class averages with the gaussian picker or template picker. However, we feel that the background from the GO film and the folds/creases in GO are leading to many false picks.We can of course do multiple rounds of 2D classification to get rid of the false positives but are trying to see if we can use the CryoSPARC Topaz wrapper to do a better job with picking. Unfortunately, we are struggling to generate a good Topaz model and I was wondering if anyone has experience of using Topaz on GO grids. Any suggestions on how we might optimise the model will be really very helpful. I am trying to provide as much details as I can think of below.

I am attaching a representative micrograph and some 2D class averages obtained with template picking (the particles were heavily binned to a pixel size of 4.2Å/pixel).

I am pasting the Topaz commands from the output log of Topaz train below.

Starting particle pick preprocessing by running command /opt/miniconda/envs/topaz/bin/topaz convert --down-scale 3 --threshold 0 -o /data/Users/Anjali/data_processing/CS-pfaphu/J105/topaz_particles_processed.txt /data/Users/Anjali/data_processing/CS-pfaphu/J105/topaz_particles_raw.txt

Starting dataset splitting by running command /opt/miniconda/envs/topaz/bin/topaz train_test_split --number 11 --seed 1608771774 --image-dir /data/Users/Anjali/data_processing/CS-pfaphu/J105/preprocessed /data/Users/Anjali/data_processing/CS-pfaphu/J105/topaz_particles_processed.txt

Starting training by running command /opt/miniconda/envs/topaz/bin/topaz train --train-images /data/Users/Anjali/data_processing/CS-pfaphu/J105/image_list_train.txt --train-targets /data/Users/Anjali/data_processing/CS-pfaphu/J105/topaz_particles_processed_train.txt --test-images /data/Users/Anjali/data_processing/CS-pfaphu/J105/image_list_test.txt --test-targets /data/Users/Anjali/data_processing/CS-pfaphu/J105/topaz_particles_processed_test.txt --num-particles 100 --learning-rate 0.0002 --minibatch-size 256 --num-epochs 10 --method GE-binomial --slack -1 --autoencoder 0 --l2 0.0 --minibatch-balance 0.0625 --epoch-size 5000 --model resnet8 --units 32 --dropout 0.0 --bn on --unit-scaling 2 --ngf 32 --num-workers 0 --cross-validation-seed 1608771774 --radius 3 --num-particles 100 --device 0 --no-pretrained --save-prefix=/data/Users/Anjali/data_processing/CS-pfaphu/J105/models/model -o /data/Users/Anjali/data_processing/CS-pfaphu/J105/train_test_curve.txt

I am attaching the precision as a fn. of epoch for train and test micrographs

Here is the command from the corresponding Topaz extract job.

Starting extraction by running command /opt/miniconda/envs/topaz/bin/topaz extract --radius 24 --threshold -6 --up-scale 3 --assignment-radius -1 --min-radius 5 --max-radius 100 --step-radius 5 --num-workers 0 --device 0 --model /data/Users/Anjali/data_processing/CS-pfaphu/J105/models/model_epoch03.sav -o /data/Users/Anjali/data_processing/CS-pfaphu/J108/topaz_particles_prediction.txt [58 MICROGRAPH PATHS EXCLUDED FOR LEGIBILITY]

We tested the extraction on a small subset. I am attaching a micrograph with the extracted particles highlighted and as can be seen, while there were very few incorrect picks, many particles were not selected. We also tried using the pre-trained mode of Topaz and with that behaved similar to our template picking job (i.e. we could pick all the particles along with a lot of false positives).

olibclarke · May 24, 2023, 4:53pm

Hi,

A couple of queries:

How many particles did you use for training? And where did they come from (manually picked, 2D classification, etc)? An example micrograph with training picks might be helpful.
What threshold are you using for particle extraction, and did you try different thresholds?
What did you set the expected number of particles to? This should represent the estimated number of true particles on average present in each micrograph used for training.

Cheers
Oli.

Indrajit · May 25, 2023, 9:16am

Dear Oli

We used 1244 manually picked particles for training. I am not completely sure about this, but we did split the training micrographs into training and test sets with a 80:20 split. Now if the test set never gets used for training then we are using about 995 particles for training (1244*0.8). I am attaching an example of a manually picked micrograph used for training.

In the Topaz extract job we tried two extraction thresholds of -6 and -10 with exactly the same result. We did not vary the particle threshold and it was always set to the default value of 0.

We set the expected number of particle per micrograph to be 100. Using Gaussian picker we were getting about 150 per micrograph and we estimated that perhaps 30% of those are false positives.

Best,
Indrajit

olibclarke · May 25, 2023, 10:08am

Hi Indrajit,

I would suggest varying the particle threshold - try values of -2 and -4 instead of the default of 0 and see how that goes. Looking at this mic I would maybe lower the expected particle number a bit too, guestimating 50 might be closer?

Cheers
Oli

olibclarke · May 25, 2023, 10:19am

In terms of the quantity of training data, 1000 good picks is a reasonable starting point, although depending on heterogeneity you may want to increase that a bit. I would be careful about centering your picks - Topaz will learn to pick what you tell it to pick, so if you provide mis-centered picks, you will get the same out of the model.

Indrajit · May 25, 2023, 11:33am

Dear Oli

Many thanks for your suggestion. Changing the particle threshold to -3 did the trick. Somehow I completely missed that some of the manual picks are off-centered Thanks for pointing it out. We will re-pick and re-train Topaz.

Best,
Indrajit