Hi @team,
sorry if I just didn’t figure out how to do it but I failed on ordering 2D class averages according to their class number. By default ordering is based on particle count, it would be great to have the chance deselect this in favor of displaying the same order as in the final stack.
Just chiming in to say that oddly enough, we may also have a possible use for just such a feature or an extension of it.
For instance, if at the end of an extensive series of online-EM iterations we already have a good idea, in some deterministic fashion, what the outcome of the subsequent full-iterations are going to be, this could be expanded into facilitating pre-selection and queueing of subsequent job inputs.
There is a parameter called Sort classes by number of particles which is on by default. If you turn this parameter off, you should get the result you’re looking for! If you prefer this option to always be off, you may want to create a Blueprint to this effect.
thanks for your reply! To make it clear: that needs to be tuned during 2D classification already? I wouldn’t want to re-run jobs, instead it should be available during select 2D classes.
@tarek, can I ask why you want to sort the class averages by index number in Select 2D?
@leetleyang, in general we would not recommend selecting by class index before 2D Classification is complete. Even if the classes seem stable during the classification, the final iterations can significantly change the class assignments and averages. If you want to build a pipeline before the 2D Classification is complete, we recommend taking a look at Reference Based Auto Select 2D instead.
in this particular case I have a quite tricky dataset of a helical sample where I carefully wanted to select class avgs with an alternative viewer, e.g. EMAN or pyHl and check power spectra side by side.
I find the limited scalability of class avg display in cryosparc sometimes unfortunate.
I am still wondering how the 2D classes in the blob-.mrc files are sorted. They are not in a series of decreasing numbers, as is the default in the ‘2D select’ job.
The class average images in the .mrc files are in an arbitrary order. This is because the very first iteration of 2D Classification creates class averages by randomly assigning particle images to a class. There’s no way to know which of these classes will end up having many particles (i.e., a “good” class) and which will end up with very few (a “junk” class). They stay in this order in the .mrc file throughout the entire job.
When Sort classes by number of particles is on (the default), 2D Classification re-orders the classes by the number of particles before making plots, etc. This means that the first class in the class average plot (“class 0”) is the most populous class.
However, the job does not change the order of images in the .mrc file. Therefore, there’s no reason to believe that the 0th class will be the 0th image in the class average .mrc file, since those are ordered randomly. The “0th class” will likely also refer to a different image in the .mrc file in each iteration, since the particles in each class change over time.
If Sort classes by number of particles is off, the 0th class will not be re-ordered by particle count and should therefore be the 0th slice of the .mrc file in each iteration.
Selecting 2D classes with an external program
If you want to programmatically select particle classes, you can use cryosparc-tools to do this instead of running a Select 2D Classes job. You could feed your external program the .mrc file and have it return the indices of classes it wants you to keep. Then you can load the particles dataset from your 2D Classification job and filter only particles with those indices in alignments2D/class. For example, if your external program told you to keep particles which correspond to the images [0, 5, 12] in your .mrc file:
@rwaldo
Thank you for your prompt and insightful response.
Is there also a way to extract the numbers per class? If I download the 2D class averages, the file gets automatically converted into a .mrc file, within which I didn’t find any information on the number of particles per class.
Having the 'Sort classes by number of particles‘ set to false makes reading the numbers within the ‘2D select’ job possible. However, this process is very time-consuming. Is there a better way to do it?
Hi @haugm! I would use cryosparc-tools to do this. For example, if you load a 2D Classification job (see this example for one way to do that) you could get the particle count per class by running:
In this case, class_number would be a list of the indices in the .mrc file (so, unsorted class numbers, regardless of the setting in 2D Classification) and counts would be the particles in that class.