Class similarity definition in 3D classification?

Hi,

How is class similarity defined and applied in 3D classification?

The tooltip for this parameter states:

“Expected similarity of structures from different classes. A number between 0 and 1, where 0 means classes are independent, and 1 means classes are very similar.”

Based on this description, I would expect class similarity=0 to be the same as force hard classification, but this is not the case - even if class similarity is set to 0, there can be a wide spread of per-particle ESS values. So I am wondering how it is defined?

I also wonder whether it might be worth adding an option to switch force hard classification on once class similarity has completed annealing to zero - this would offer some additional flexibility to the job (as sometimes force hard classification is essential to get good results, but I am not sure that applying it right from the start is always the best strategy).

Cheers
Oli

1 Like

Hi @olibclarke!

Class similarity in 3D Classification works in much the same way as it does in ab initio reconstruction.

Briefly, class similarity is a way of accounting for the fact that early on in a classification of any kind, our models are not very good. So if differences between two true classes are small compared to the overall object, the two classes run the risk of being combined during the early, low-quality iterations.

Class similarity is a fudge factor that forces particles to spread the probability mass among the classes.

  • A class similarity of 1.0 forces all particles into all classes equally, regardless of the probability for each class calculated during the expectation step.
  • A class similarity of 0.0 uses the probabilities calculated during the expectation step (and hard classification) directly — in other words, it does not force particles into classes to which they don’t seem to belong.

This is what explains the ESS spread you’re observing with similarity of 0.0 but force hard classification off: 0.0 just means “Use the calculated class probabilities”.


As for your feature request, perhaps you could talk me through where you see it being useful? To my mind, having hard class off during similarity annealing and turning it on once similarity is annealed is more-or-less similar to (but certainly not exactly the same as!) having a lower starting class similarity (essentially, more weight to “proper” classes during annealing ~ less forced weight to “improper” classes) but I could certainly be missing something!

4 Likes

Hi @rposert - thanks that’s very helpful!

My feature request was based on situations where the mean ESS remains high, even when class similarity is 0 from the start, regardless of number of epochs. In these situations switching on force hard classification sometimes gives good (or at least interpretable) results (I guess by allowing initial volumes to diverge in situations where the broad probability distribution of each particle would not otherwise allow for it). I was wondering if in these situations, using weighted backprojection for the initial iterations, and hard classification for the later iterations, might give improved (or at least different) results.

One could also argue that the reverse might be helpful - force hard classification for a certain number of initial iterations, to allow volumes to diverge, then switch on weighted backprojection. I’m not really sure if either strategy would help without testing though.

Cheers
Oli

2 Likes

Interesting! Thanks for elaborating, and I’ve recorded your feature request!

2 Likes

Hi @rposert ,

I just want to follow up on this thread. Does this mean if I have force hard classification turned on, there’s no need to use a low similarity because the class with the highest probability will now have a probability of 1?

Thank you.

Correct - if you have force hard classification switched on, class similarity will have no effect

1 Like