Understanding topaz output

Hi all,

Im trying to understand intuitively what the two output plots are from topaz train and therefore understand how TOPAZ training happens.

  1. Training precision vs. epoch.
    a. How is this value measured? Is it how well the model picks the test particles? So if very high, it has the ability to find the particle that you gave it? Here, my dataset gives a very high value.

  2. Test-average precision vs. epoch
    a. How is this value measured? Is the model then applied to all the micrographs supplied and looks for new particles? What does the number represent? The number of particles selected vs the number of blobs/other particles that are present? So its a fraction of model picks vs total particles expected? Therefore if you have a homogenous sample then it should approx 1, but if you have a heterogenous sample then you should approximate the percentage of whatever conformation you are training for. Here, my data suggests ~ 0.5, but the sample is a mix between dimer and monomer and I am picking for monomer.

When you run Cross validation say, changing numb of particles present, if there is no real change in test-average precision then are all model sufficent? Does this suggest that even though we are looking for different number of particles, its still finding the same number of particles (ie aiming for X but settling on Y # particles)

I used this Topaz model, and it really improved my picking (by about 50% more particles, with 5 more views (compared to 3 views after template picking) so it is working. I am just applying TOPAZ to another dataset which is very heterogenous, so the test-average precision is ~0.15 but it could be that only ~ 10% 0f all blobs are what I am looking for.


Hi @mjmcleod64! These are great questions about Topaz! I’ll do my best to guide you in the right direction here, but there are some great resources in our guide and in Topaz’s documentation and paper as well!

Precision plots

What is precision?

Precision, in this context, is the number of correct particles your model found divided by the total number of particles your model picked. Some examples:

  • A model which correctly identified 10/10 particles and did not pick any non-particles would have a precision of 1.0
  • A model which correctly identified 7/10 particles and did not pick any non-particles would also have a precision of 1.0
  • A model which correctly identified 7/10 particles but also picked 3 non-particles would have a precision of 0.70

A more precise model is picking less noise, but it is not necessarily picking more good particles. So generally, yes, a higher number is probably better, but a model which picked only 1 particle per micrograph might get a great score without being very useful!

What is average-precision?

Average-precision is a way of making up for the fact that the precision does not increase as the model picks more good particles. Its direct calculation is a bit more complicated than that for precision, but in essence, imagine that the model first ranks every area of the micrograph by how likely it thinks that area is to contain a particle. The average-precision score is a measurement of how well the model did at correctly ranking those areas.

Training vs. test

If you reserve some of the data for testing, you will see both how well the model performs on the data it was trained on as well as on the test data that it did not use during training. When you reserve a test data set, you should prefer to look at the test precision rather than the train precision.

Precision in Topaz

Remember that Topaz only requires partial labeling. That is, it doesn’t consider things you didn’t label to be definitely not a particle, it just doesn’t know they definitely are a particle. This means it is tolerant of picking particles you didn’t label as long as it’s not picking more particles than you told it to expect per micrograph. For more detail I recommend you check out the Topaz paper, especially the discussion of positive-unlabeled (PU) learning.

So for instance, say a given micrograph in your test set has 100 good particles. However, you only label 50 of those particles. If your Topaz model successfully picks all 100 particles in this micrograph, it would only have a precision of 0.50, but it would actually be performing perfectly.

Your data

So in your first plot, you get to a precision of 0.95. This means the model is picking very few things you didn’t label. In your second plot, your precision ends up around 0.45. This means that for every particle you labeled in the training set, Topaz is picking about two.

If you think you only manually picked about half of the particles, this is probably good. If you think you labeled all of the monomers in your input data, this might be concerning. I would especially be worried if there are approximately equal numbers of monomer and dimer, in which case it would be possible that Topaz is picking both monomers and dimers even though you only labeled monomers.

Highly heterogeneous dataset

You mention that you’ve moved on to applying Topaz to a highly heterogeneous dataset and are getting very low precision values. This may be alright, provided you do not think you picked every good particle in the training dataset.

If you do think you trained on a completely labeled dataset, this means the model is picking particles you don’t want it to. You may want to consider an algorithm that does not use PU learning, such as crYOLO.

Cross Validation

Yes, if there is no clear winner in the cross validation plot it’s likely that the model is not especially sensitive to that parameter within the range you tested.


This was also discussed a bit here: How to interpret metrics from a training run? · tbepler/topaz · Discussion #78 · GitHub

Is there a reason why cryoSPARC is only plotting precision? It would be convenient if it could plot all the metrics in the topaz log file.
The learning curve (loss as a function of epoch or iter) is useful to look at, to assess training convergence. The AUPRC curve also tells something about training convergence (if I understand correctly: if it starts dropping, this means additional epochs actually degrade the model).

Hi @Guillaume, thank you for making those suggestions! We are considering adding these plots in a future release, will post here when we have an update.

1 Like