When I try to initialize 3D classification with a mixture of unique and non-unique volumes - e.g. 2 copies of each of 7 classes from heterorefinement, for example - I get the attached error.
Is there any reason for this restriction? When using relion we routinely seed classification without alignments with identica input classes.
have you tried multireference input star of the same volume in relion? i find it populates one of them completely and the other(s) not at all for each reference identity. K classes I think is different, it slightly changes the input models so they don’t have identical probability.
Thanks for flagging this! You’re right that this probably shouldn’t be a restriction – we are looking into it and will adjust in a future release.
Note that the current way we check for ‘uniqueness’ is via the path to the volume’s mrc file. So one potential workaround is to first backproject a number of (identical) volumes by cloning a bunch of Homogeneous Reconstruction Only jobs with the same particle sets, and then to pipe those volumes into 3D class.
It seems this restriction is gone in recent versions, but there is a bug I think, at least in v4.4.
When I use a mixture of unique and non-unique volumes, the duplicate ones misbehave.
For example - let’s say I have 10 different classes, and then include 10 copies of the consensus volume to make 20 classes.
The initial volumes at the top of the log look normal - but in the first iteration, all the plots for all bar one of the duplicates classes are blank, and the classes are empty (whereas the non-duplicate classes behave as expected)… If I use slightly different volumes (e.g. perturbing using volume tools), this does not happen and they behave normally).
Thanks for reporting. Unfortunately, I wasn’t able to reproduce this behaviour with a couple of different dataset / class combinations. Any custom parameters in this job?
Ok thanks, was able to reproduce this! This behaviour happens with identical classes AND hard classification on. This is in some ways “expected” behaviour as it’s implemented now because we use a argmax() call on probabilities during the E-step – if probabilities are numerically identical, this will always return the ‘first’ volume (in the inputs). Thus in the first iteration all other identical classes are emptied out and stay empty.
We have an idea for how to improve this in the future (sample a volume according to the posterior rather than simply argmax during the first iteration of hard classification). For now, you’ll have to add some noise to avoid this behaviour.