Topaz picking issues with many micrographs

ClaudiaKielkopf · December 29, 2019, 10:41pm

Hej all,

I have a couple of issues using Topaz picking which I hope to get some help with here.

I was able to create a subset of 20 micrographs out of ~2000, template pick, create 2D classes to train a topaz picking model. I can also extract the particle locations using Topaz extract on these exact 20 mics and inspect picks, all looks good.

However, I cannot extract from all micrographs using the trained picking model. It seems that the job spends two days trying to preprocess the micrographs which for 20 mics takes 22sec but just won’t finish on the 2000 mics even after two days. What am I doing wrong? Anyone else who has this issue? Do I need to use smaller subsets, of say 250 mics for extraction?

I should say that I’m running these jobs on an HPC, so I’ll taking the issue up with them next week in case there are some underlying issues that I can’t see or solve. Perks of working between the years, if anyone has any suggestions in the meantime, I’d be very grateful since I work with a challenging, small protein! Let me know if you need more details to narrow the issue down.

Thank you,
Claudia

jyoo · January 6, 2020, 8:26pm

Hi @ClaudiaKielkopf,

The preprocessing for Topaz can get quite slow for large number of micrographs. Here are some suggestions for speeding up the extraction process.

Try splitting up the micrographs to be extracted from into multiple Topaz extraction jobs and use the same model for all the jobs. This will at least allow for multiple micrographs to be extracted from concurrently. The extracted particles can then be combined into one particles output using a 2D classification job. If there is an error with passing the output particles into the 2D classification job due to missing fields/components, run the extracted particles and the micrographs the particles from from through an “extract from micrographs” job then use the particles output from that job.
Much of the preprocessing that occurs in the Topaz jobs is spent on downsampling the micrographs. If the micrographs are small enough such that they do not need to be downsampled, try setting the downsampling factor to 1. The time saved during preprocessing may be greater than the increased time caused by training larger micrographs. Keep in mind that this will increase the memory used by the job. This will require the training to be repeated with the downsampling factor set to 1.

Try each of the suggestions separately or if necessary, try doing both suggestions. Hopefully, suggestion 1 alone will help alleviate your issues. Best of luck.

Regards,
Jay Yoo

ClaudiaKielkopf · January 6, 2020, 9:10pm

Hej Jay,

thank you for your help, I ended up doing exactly what you suggested in 1), making subsets of 500 or 750 micrographs and the jobs finished in around 1-2 hours for denoising (I couldn’t get all mics denoised in one go either) and barely 10 minutes for extraction. I don’t understand why the processing time is not linear with the number of micrographs but I’m not going to look a gift horse in the mouth

Re 2): The data is collected on a Titan Krios, so I was downsampling by a factor of 16 for training the picking model. Using 500 images for training, the training job finished within 2h, so that was fine (all jobs run on one GPU, btw, a P100).

Cheers,
Claudia