Class similarity definition in 3D classification?

olibclarke · March 6, 2024, 10:02pm

Hi,

How is class similarity defined and applied in 3D classification?

The tooltip for this parameter states:

“Expected similarity of structures from different classes. A number between 0 and 1, where 0 means classes are independent, and 1 means classes are very similar.”

Based on this description, I would expect class similarity=0 to be the same as force hard classification, but this is not the case - even if class similarity is set to 0, there can be a wide spread of per-particle ESS values. So I am wondering how it is defined?

I also wonder whether it might be worth adding an option to switch force hard classification on once class similarity has completed annealing to zero - this would offer some additional flexibility to the job (as sometimes force hard classification is essential to get good results, but I am not sure that applying it right from the start is always the best strategy).

Cheers
Oli

rposert · March 6, 2024, 10:53pm

Hi @olibclarke!

Class similarity in 3D Classification works in much the same way as it does in ab initio reconstruction.

Briefly, class similarity is a way of accounting for the fact that early on in a classification of any kind, our models are not very good. So if differences between two true classes are small compared to the overall object, the two classes run the risk of being combined during the early, low-quality iterations.

Class similarity is a fudge factor that forces particles to spread the probability mass among the classes.

A class similarity of 1.0 forces all particles into all classes equally, regardless of the probability for each class calculated during the expectation step.
A class similarity of 0.0 uses the probabilities calculated during the expectation step (and hard classification) directly — in other words, it does not force particles into classes to which they don’t seem to belong.

This is what explains the ESS spread you’re observing with similarity of 0.0 but force hard classification off: 0.0 just means “Use the calculated class probabilities”.

As for your feature request, perhaps you could talk me through where you see it being useful? To my mind, having hard class off during similarity annealing and turning it on once similarity is annealed is more-or-less similar to (but certainly not exactly the same as!) having a lower starting class similarity (essentially, more weight to “proper” classes during annealing ~ less forced weight to “improper” classes) but I could certainly be missing something!

olibclarke · March 6, 2024, 11:25pm

Hi @rposert - thanks that’s very helpful!

My feature request was based on situations where the mean ESS remains high, even when class similarity is 0 from the start, regardless of number of epochs. In these situations switching on force hard classification sometimes gives good (or at least interpretable) results (I guess by allowing initial volumes to diverge in situations where the broad probability distribution of each particle would not otherwise allow for it). I was wondering if in these situations, using weighted backprojection for the initial iterations, and hard classification for the later iterations, might give improved (or at least different) results.

One could also argue that the reverse might be helpful - force hard classification for a certain number of initial iterations, to allow volumes to diverge, then switch on weighted backprojection. I’m not really sure if either strategy would help without testing though.

Cheers
Oli

rposert · March 7, 2024, 3:28pm

Interesting! Thanks for elaborating, and I’ve recorded your feature request!

csparc_addict · March 24, 2024, 4:19pm

Hi @rposert ,

I just want to follow up on this thread. Does this mean if I have force hard classification turned on, there’s no need to use a low similarity because the class with the highest probability will now have a probability of 1?

Thank you.

olibclarke · March 24, 2024, 7:17pm

Correct - if you have force hard classification switched on, class similarity will have no effect

yoshiokc · July 2, 2024, 9:01pm

I’d like to +1 the idea of having a more intuitive “knob” for tuning how quickly this routine settles into its classes. If such a knob already exists, maybe the docs need some updates (quite a few parameters recommend not changing the default). I feel like I spend a lot of time in the current implementation of 3D classification tuning on knife’s-edge. Leaving hard classification off and tweaking class similarity seems to kinda have the effect I’m looking for, but even very small changes in this parameter (~0.02) can be the difference between converging on a 50/50 or 99.9/0.1 split of particles.

olibclarke · July 2, 2024, 9:41pm

@yoshiokc one thing I often find helpful is fixing the learning rate - setting the O-EM learning rate half-life to zero - then tweaking the learning rate in increments, starting at 1 and decreasing, with force hard classification on. A fixed learning rate of 1 gives good results in a surprising number of cases.

yoshiokc · July 2, 2024, 10:29pm

Going to try this right now, thanks @olibclarke . Have not played with the half-life before but it sounds promising.

olibclarke · July 7, 2024, 4:22pm

The other thing - if you see everything collapsing into one class, try increasing the batch size. Sometimes the default of 1000 seems to lead to instability, increasing it a bit often gives more stable results in my experience.

rbs_sci · July 8, 2024, 12:03am

1000 almost always collapses for me. I usually set 5-10,000. That seems stable, and hasn’t been dramatically slower in total.

olibclarke · July 8, 2024, 12:11am

Yes - depends on the problem and target res, but if using a high-ish target res then I definitely increase it. Low target res, like 10-15 Å, 1000 is ok