Suggestion for workflows - select largest class on the fly?

olibclarke · November 12, 2023, 5:35pm

Hi,

Love the new workflows feature in v4.4!

For the future, would it be possible to add the capacity to choose the largest class on the fly?

What I mean is, if I run a heterogeneous refinement, there is a quick action to (for example) build a NU-refine job for the largest class, but this can only be built once the heterorefine job has completed (and the largest class is therefore established).

Would it be possible to add a “placeholder” slot to heterogeneous refinement, multiclass ab inito reconstruction, and 3D classification, similar to particles_all_classes, perhaps particles_largest_class or similar, that resolves to the largest class at job launch time?

This would allow the creation of more flexible pipelines, where for example the most populated ab initio class could be selected on the fly, without prior knowledge of which class that will end up being.

This would also be useful independent of Workflows, to quickly queue an automated pipeline to process data on the fly.

Cheers
Oli

ccgauvin94 · November 13, 2023, 4:10pm

The ability to programmatically select Ab Initio volumes and heterogeneous classes is definitely a needed component for workflows.

kstachowski · November 13, 2023, 10:14pm

Hi @olibclarke and @ccgauvin94,

Thanks for the feedback, it has been noted! I do have a few follow-up questions:

Why is the largest class important? Are there any cases where a less populated class would be important to continue processing? Do you see this being useful when there is uncertainty about compositional homogeneity or would the intended use be for more routine workflows?

We also encourage any other feedback related to workflows!

Cheers,
Kye

olibclarke · November 13, 2023, 10:31pm

Hi Kye,

There are absolutely cases where minor components are important!

However, for a first/quick run through, assuming we are expecting (e.g. based on 2D classification or inspection of micrographs) one major class with perhaps some minor contaminants (e.g. empty nanodisc, free Fab), a multi-class ab-initio is often a quick way to separate out some initial heterogeneity - for something that I run overnight unattended, picking the largest class and performing further processing on it would often be a reasonable starting point.

Of course, there is no guarantee that the majority conformation will sort into just one class (often the correct and inverted hands will separate out), but it is a reasonable starting point (and one could mitigate this by running with a couple of different random seeds and processing in parallel).

This would also be useful when running a heterogeneous refinement with identical starting models - often we do this with very aggressive initial lowpass filtering (e.g. 60-80Å), and this can be useful as an automated clean up (if we could automatically select the largest class for further processing).

For completeness, being able to queue processing of the largest class and the remainder (all the classes except the largest class) would be helpful (to accommodate a branched workflow where one processes the largest class and reclassifies everything else separately).

Cheers
Oli

olibclarke · November 14, 2023, 8:28pm

For automated workflows, what would actually be really handy would be a way to cluster & combine ab initio classes, taking into account arbitrary orientations and potential inversion of hands. The new job type for combining 3D classes is part way there, would just need combination with an align 3D step and testing the correct and inverted hands.

kstachowski · November 28, 2023, 8:57pm

Hi @olibclarke,

Thanks for the feedback and more in depth explanation. We have noted both of these feature requests! I do agree with you that automated class selection in workflows is a current pain point.

There might be a nice way to do this with cryosparc-tools and the align 3D maps job. In theory, you could align all class volumes to a known reference and then calculate a cross-correlation (or other similarity metric) and move forward with the highest scoring volume(s). When particles are not attached, the align 3D volumes job will test shifting and rotation, along with checking handedness. Then with a known volume (or set of volumes), you should be able to move the associated particles and volumes onto the next steps.

Cheers,
Kye