3D classification requires unique volumes

olibclarke · June 27, 2022, 8:15pm

Hi,

When I try to initialize 3D classification with a mixture of unique and non-unique volumes - e.g. 2 copies of each of 7 classes from heterorefinement, for example - I get the attached error.

Is there any reason for this restriction? When using relion we routinely seed classification without alignments with identica input classes.

Cheers
Oli

CryoEM1 · June 28, 2022, 9:50pm

have you tried multireference input star of the same volume in relion? i find it populates one of them completely and the other(s) not at all for each reference identity. K classes I think is different, it slightly changes the input models so they don’t have identical probability.

vperetroukhin · June 29, 2022, 5:06pm

Hey @olibclarke,

Thanks for flagging this! You’re right that this probably shouldn’t be a restriction – we are looking into it and will adjust in a future release.

Note that the current way we check for ‘uniqueness’ is via the path to the volume’s mrc file. So one potential workaround is to first backproject a number of (identical) volumes by cloning a bunch of Homogeneous Reconstruction Only jobs with the same particle sets, and then to pipe those volumes into 3D class.

Valentin

olibclarke · February 18, 2024, 1:32am

Hi @vperetroukhin,

It seems this restriction is gone in recent versions, but there is a bug I think, at least in v4.4.

When I use a mixture of unique and non-unique volumes, the duplicate ones misbehave.

For example - let’s say I have 10 different classes, and then include 10 copies of the consensus volume to make 20 classes.

The initial volumes at the top of the log look normal - but in the first iteration, all the plots for all bar one of the duplicates classes are blank, and the classes are empty (whereas the non-duplicate classes behave as expected)… If I use slightly different volumes (e.g. perturbing using volume tools), this does not happen and they behave normally).

Cheers
Oli

vperetroukhin · February 20, 2024, 9:05pm

Hey @olibclarke,

Thanks for reporting. Unfortunately, I wasn’t able to reproduce this behaviour with a couple of different dataset / class combinations. Any custom parameters in this job?

Thanks!
Valentin

olibclarke · February 20, 2024, 9:11pm

48 classes, target res 12, 2 O-EM epochs, batch size 1000, O-EM learning rate init 1, initial lowpass 15, intialization mode input, O-EM learning rate half-life 0, force-hard classification on

vperetroukhin · February 21, 2024, 4:21pm

Ok thanks, was able to reproduce this! This behaviour happens with identical classes AND hard classification on. This is in some ways “expected” behaviour as it’s implemented now because we use a argmax() call on probabilities during the E-step – if probabilities are numerically identical, this will always return the ‘first’ volume (in the inputs). Thus in the first iteration all other identical classes are emptied out and stay empty.

We have an idea for how to improve this in the future (sample a volume according to the posterior rather than simply argmax during the first iteration of hard classification). For now, you’ll have to add some noise to avoid this behaviour.

Valentin

lucky · March 28, 2024, 2:45pm

hi, olibclarke
it means different volumes with different resolution here?

olibclarke · March 28, 2024, 3:29pm

just means copies of identical volumes, versus sets of non-identical volumes (of any type)