Novice needing help w/ Topaz

Hi,

I’m a grad student who is learning cryo-em on my own (I am most definitely not an expert - I’ve only been at this for a month). I’ve read a lot about topaz denoise and topaz particle picking - but the guides I’ve read are all fairly disjointed and I’m unsure what inputs I need and where to find them.

I want to train a topaz denoise model to apply to all my micrographs to use downstream for Training a Topaz picking model and Topaz particle picking. I’m using the newest version of cryosparc v5.0.4 - with Topaz v.0.2.5.

I was following these guides (see below) and I’m confused on the 4 following questions:

  1. Where I find the “training micrograph” dataset from?

I read this in the guide: “If a new Denoising model will be trained (which is the recommended workflow), the input micrographs must also have training data. This data is only generated by Patch Motion Jobs run with CryoSPARC version 4.5 or later.”

  • I did the Patch Motion Correction & Patch Motion CTF Estimation Jobs in cryosparc v5.0.4 (of which Patch Motion Correction was setup to “output denoiser training data” w/ 200 movies).

  • The job doesn’t have any Training data output though??

2. What inputs go into the Topaz Denoising job to train a Topaz denoise model on my dataset?

I read this in the guide: “Select Topaz Denoise (BETA) from the Job Builder. Drag and drop the exposures_success output from the completed CTF estimation job and the imported_movies output from the completed Import Movies job into the micrographs and training_micrographs inputs respectively.”

  • I put the drug the 5,796 exposures_success output from Patch Motion CTF Estimation into the Micrographs Input

  • Left the Denoising model Input empty

  • I put the drug the 5,796 imported_movies output from Import Movies into the Training Micrographs Input (Since there was no Training Data output from question 1 above??)

3. What inputs go into (and what job do they come from) the Topaz Train?

I’ve dug around and read a bunch of treads - but it’s unclear to me what input and where the micrographs & particles go into the job for:

  • Initially training the model on my data

    • Should this be done on the Topaz Denoised micrographs from step 2 - OR - Non-Denoised micrographs?

    • I’m assuming I should curate a subset of micrographs and not use all of them?

      • If so - Should this be done on ONLY good micrographs? How many and what are the best metrics to follow? (my particles are of a protein >50kDa)
    • I’m assuming the particle stack will come from a Manual Picker Job used on those same micrographs?

  • In the general settings dropdown there is a “absolute path of directory containing preprocessed directory” - which says “path directory containing preprocessed micrographs for cross validation” when I hover:

    • Do I need this? If so, where do I navigate/find this?

4. What inputs go into (and what job do they come from) the Topaz Extract ?

  • Applying the training output model to my data

    • Should this be done on the Topaz Denoised micrographs from step 2 - OR - Non-Denoised micrographs?

    • Should this be done on ALL 5,796 the micrographs from above - OR - should this be done on ONLY good micrographs (ie - not a training subset, but only micrographs I’ll use for further data processing)?

      • If so - what are the best metrics to follow to select only good micrographs? (my particles are of a protein >50kDa)
    • I understand the model will come from the Topaz Train Output

  • In the general settings dropdown there is a “absolute path of directory containing preprocessed directory” - which says “path directory containing preprocessed micrographs for cross validation” when I hover:

    • Do I need this? If so, where do I navigate/find this?

Thank you so much!

Hello,

Topaz has its own denoiser, but training it requires movies. I only used its pre-trained model and it was fine.

But now CryoSPARC’s own denoiser (more recently introduced) performs much better, and is fast to train on your own data. So I would suggest using this one! This is what the documentation refers to with the training data generated by patch motion correction.

Now, training the topaz picking model isn’t going to work on micrographs denoised by CryoSPARC, but this is fine: you simply use the denoised micrographs to assist your manual picking, then train topaz on raw micrographs, and apply the resulting picking model to raw micrographs too.

That said, several people (myself included!) have found that CryoSPARC’s blob and template pickers applied to micrographs denoised by its denoiser (plus some filtering of the resulting picks with the “Inspect picks” job) performs just as well as topaz.

Good luck!

4 Likes

Hi Guillaume,

Thank you for the response, I have done the normal workflow you mentioned.

I would still like to try Topaz specifically with Topaz denoised micrographs (since I don’t have the latest version of Topaz that is compatible with the cryosparc denoised micrographs).

Would love some insight on Topaz specifically. :folded_hands:

I would recommend just using topaz without denoising - just use CS denoising for visualization of the training data, & train topaz on the original data. I haven’t found topaz denoise especially helpful for improving the results of topaz training, & as Guillaume says the CS denoiser works better for visualization

I don’t think the latest version of topaz is compatible with CS denoised micrographs, is it?

This is where the disjointed source information is confusing - I’ve read that you need topaz denoised micrographs to pair with the topaz train (otherwise it doesn’t do well because of how it maps back to the original micrographs once you train the topaz model on your denoised data to pick the particles)?

In the new cyrosparc release notes it specifies that cryoparc denoised micrographs works with topaz v0.3.0 (see “Particle Picking” section) - but I only have the old version of topaz which is unsupported.

I would like to try training a topaz denoise model on my data and then take that model to process my micrographs to use for topaz train and extract. If it doesn’t improve - that’s fine I would just like to give it a try - rather than do work arounds, but what I haven’t found clear answers to is what I mentioned in the first post so even start.

This is where the disjointed source information is confusing - I’ve read that you need topaz denoised micrographs to pair with the topaz train (otherwise it doesn’t do well because of how it maps back to the original micrographs once you train the topaz model on your denoised data to pick the particles)?

This is incorrect. Just train with the denoised micrographs as input, but switch off the option in topaz train to use the denoised micrographs (it will use the original mics which are associated with the denoised ones you provided). Can guarantee this works fine :slight_smile:

I see what you mean in the release notes, interesting! Would like more detail on that (@rwaldo?), as it is not mentioned in the release notes for Topaz v0.3..

1 Like

Hi @ilya.v,

Thanks for posting your questions to the forum! I first would like to echo the sentiments/experiences of @Guillaume and @olibclarke in that we observed (and other users have reported) that use of the template/blob picker and appropriately selected parameters in conjunction with the CryoSPARC denoised micrographs will perform as well as Topaz will with minimal headaches. We would suggest you try to process your dataset in this manner first and, if there are any issues, try to use Topaz as Topaz is more complicated to get right and will likely require some tuning.

Here is a guide to picking in CryoSPARC that is a good example of “how to do it right”.

Additionally, here is a small test we did showing that the performance of template picking on CS denoised micrographs performs just as well as Topaz (Table S8): https://www.biorxiv.org/content/10.1101/2025.10.17.682689v1.full#:~:text=8.6%20Supplementary%20Tables

In the event you want to use Topaz further, here are some answers to your questions:

  1. The training data mentioned in your links is specifically for the CryoSPARC denoiser. To obtain denoised micrographs from Topaz, you should use the output micrographs from a Patch CTF job as inputs into a Topaz Denoise job.

  2. The Topaz Denoising job is used to denoise micrographs, Topaz Train is used to train a Topaz picking model on a set of particle picks and micrographs, and Topaz Extract uses the picking model generated from Topaz Train to pick all micrographs in a dataset. Notwithstanding the Topaz Extract name, particles picked by Topaz Extract still need to be extracted in a separate job, such as Extract from Micrographs.

  3. For inputs to Topaz Train, you will want to have a highly curated set of particles and micrographs. We generally see that 3-5k particles from ~100 micrographs will suffice to train a reliable and well performing model. There are a couple of ways to generate training data; the choice will depend on where you are at in your processing journey.

    For particles, you can either A) use the blob picker on CryoSPARC denoised micrographs to generate an initial set of picks, extract them, and used 2D classification to sort those particles, only selecting particles in truly good classes with well defined protein features or B) process the full dataset with 2D and 3D particle curation methods and then select a subset of the best particles from the final reconstruction and filters such as CTF fit, particles/mic, PPS values, etc. to curate/select the top 3-5k particles.

    For micrographs, you should only provide the best micrographs associated with the particles you selected. These can be be either denoised in Topaz, denoised in CS (if you are using version 5.0.X), or the regular, noisy micrographs (output of Patch CTF).

    CLARIFICATION:
    Previously, CS-denoised micrographs were not compatible with Topaz in that their “form” was not compatible with Topaz. In v5.0.X, we have updated the way we package CS denoised micrographs such that Topaz can read and use them. We do not have any testing to backup whether this will actually be useful in the quality and performance of the Topaz picking model. The version of Topaz does not matter for this, only the version of CryoSPARC (which needs to be v5.0.X).

    The Absolute path to preprocessing directory parameter allows you to specify a directory where the micrographs preprocessed by Topaz are stored and can be reused in the event you run multiple trainings and do not need to re-preprocess the mics. You can leave the parameter blank if you have not used Topaz on this dataset, in this project.

  4. You should run Topaz Extract using all micrographs of the dataset that you want to pick and the model output from Topaz Train. You can decide if you want to use denoised micrographs or not. For the preprocessing directory, you again can leave this blank for this job, but can specify the directory location (output in this job’s log) for descendent jobs as, within this job, Topaz will preprocess all input mics.

Best,
Kye

2 Likes

Agree with all this - the only thing I would say is that we still use topaz frequently for picking minor components of heterogeneous mixtures, & for this use case it can’t be replaced with template or blob picker very well

3 Likes

Agreed, there are still useful Topaz applications!

1 Like