Request for "Deep 2D Classification" job

A simple to add and very useful feature would be “Deep 2D classification”, inspired by PMID: 32680969. Basically, given a particle set and 2D class averages, the Deep 2D classification job will execute an individual 2D classification for each and every class average and then aggregate the results for selection.
This can be useful for datasets that contain small, low SNR particles that tend to average with noise in the initial steps after particle picking, or to find rare views in regular datasets.

3 Likes

Hi @RD_Cryo,

Thanks for the literature link and the feature request! We’ve noted this down as something to look into :slight_smile:

(As a PS in case it is helpful, it may be worth looking into the cryosparcm cli to see if parts of this workflow could be done programmatically. For example, one could use create_new_job and enqueue_job to create/queue new select 2D jobs for each of the classes output in a 2D classification to isolate each class and its corresponding particle stack, then subsequently launch 2D classifications on each of the select 2D jobs. Although the result aggregation would still have to be done manually, since select 2D currently doesn’t support ingesting templates/particles from more than one source 2D classification job).

Best,
Michael

3 Likes

I would be interested in seeing this workflow supported as well, possibly in conjunction with 2D classification without alignments (2D classification without alignment?).

For now, is there a way to implement this workflow with cryosparc tools? E.g.

for Job X:
   if class >1000 particles:
     select class:
       run 2D classification on selection

Is that possible with CS tools…? Perhaps optionally implementing reference-based 2D class selection for the 2D class selection in the subclassifications?

Cheers
Oli

EDIT:
An easy way to facilitate a “manual” version of this approach would be to have a “split outputs by class” option in Select 2D, in addition to auto thresholds:

You certainly can do this with cs-tools using the project.create_job() method! Note that the class indices below won’t match what you see in Select 2D Classes. If you wanted to set the class numbers to be a function of how many particles are in the subset, you’d have to move the params definition inside the for loop when setting up the 2D Class jobs.

from cryosparc.tools import CryoSPARC
import json
import numpy as np
from pathlib import Path

with open(Path('~/instance-info.json').expanduser(), 'r') as f:
    instance_info = json.load(f)

cs = CryoSPARC(**instance_info)
assert cs.test_connection()

project_uid = "P337"
workspace_uid = "W21"
job_uid = "J258"

project = cs.find_project(project_uid)
job = project.find_job(job_uid)
particles = job.load_output("particles")

sub_datasets = particles.split_by("alignments2D/class")
threshold = 1000
ext_job = project.create_external_job(
    workspace_uid,
    f"Classes larger than {threshold} particles"
)
ext_job.add_input(
    type = "particle",
    name = "particles"
)
ext_job.connect(
    target_input = "particles",
    source_job_uid = job_uid,
    source_output = "particles"
)

outputs_to_classify = []
with ext_job.run():
    for class_num, sub_dset in sub_datasets.items():
        if len(sub_dset) < threshold:
            continue
        
        output_name = f"particles_class_{class_num}"
        outputs_to_classify.append(output_name)
        ext_job.add_output(
            type = "particle",
            name = output_name,
            passthrough = "particles",
            slots = ["blob"],
            alloc = sub_dset
        )
        ext_job.save_output(output_name, sub_dset)

    ext_job.log("Note that class numbers here do not match what they would be in Select 2D")


lane = "cryoem9"
params = {
    "class2D_K": 25, # number of classes
}
for output_name in outputs_to_classify:
    class_job = project.create_job(
        workspace_uid,
        "class_2D_new",
        connections = {
            "particles": (ext_job.uid, output_name)
        },
        params = params,
        title = output_name
    )
    class_job.queue(lane)
2 Likes