Simplest way to split imported particles

This is a cryoSPARC UI puzzle. Let’s say you have combined micrographs from two datasets, and then done some 3D classification in Relion. Now you want to import these particles back into cryoSPARC, split them by dataset, assign exposure groups, and refine them. Let’s also suppose there are at least two classes and you want to refine them individually and in combination (so they need non-overlapping exposure groups).

What is the simplest way (fewest number of jobs) to do it?

It may be easier to split the exposures in cryoSPARC into exposure groups (in several steps if need be) and then let Reassign Particles to Micrographs sort out the particle stacks after import?

For instance, to prepare the exposures, Exposure Sets Tool to split the two datasets, e.g. by intersect with initial Import input, then two Exposure Group Utilities jobs to further blob-split into image-shift positions.

Unless I’m misunderstanding the statement, shouldn’t particles from the same micrograph–even if they end up in two different 3D classes–share the same exposure group?

Technically, I suppose you could prepare two sets of exposures, each with a unique series of exposure groups, specifically for particle reassignment.

Cheers,
Yang

This is fun, like a mate puzzle.

As @leetleyang said, not sure I understand the difference between “split them by dataset” and “assign exposure group”, unless you want to split both by “microscopy session” (what I’d consider a dataset) and maybe by image shift/hole position (perhaps exposure group?)

In either case, I’d think that a single Exposure Group Utilities job should be able to do it all with regex for date (or whatever your dataset filter is) and hole/shift (or whatever your exposure group is).

Indeed, the requisite number of hoops depends on how closely related the blob template is to the desired assignments. EPU’s typical output, for instance, would necessitate an initial Exposure Sets Tool split as the filenames would not necessarily encode the dataset identity.

There is also a more general argument for doing it via exposures and then reassigning the particles to them though. Assuming one wishes for particles from the same micrograph, but across two separate imports, to be assigned the same exposure group ID, separate Exposure Group Utilities jobs on the imports would likely result in incoherent assignments/order, whereas micrograph reassignment would ensure a common template.

Alternatively, I suppose one could run a single Exposure Group Utilities job connecting two Import inputs, but I think that would still require a downstream intersect split in order to work on the separate particle stacks? Whichever is more convenient.

Cheers,
Yang

I only use SerialEM, so I forgot about the non-informative file names from other data collection softwares (among their other deficiencies :wink: ). I usually use a pattern like User_Date_GridBox-GridNo_NavItem_HolePos for the movie names, so using a regex to get a certain grid (and to assign shift groups from HolePos) is easy. Let’s say the 3D classes are class 1 and 2, and the two sets of exposures are A and B.

What I have done is:
Job 1: Import class 1 & link micrographs (A+B)
Job 2: Import class 2 & link micrographs (A+B)
Job 3: Exposure group to divide class 1 to A and B (Regex for grid/dataset)
Job 4: Exposure group to divide class 2 to A and B
Job 5: Exposure group to assign shift groups to Job3-A + Job4-A (Regex for hole position)
Job 6: Exposure group to assign shift groups to Job3-B + Job4-B
Job 7: A consensus refinement or a particle sets or something to make one set of particles from
Job 8: Refine Job 1 particles, replace CTFs with job 7 CTFs (exposure group ID is part of CTF)
Job 9: Refine Job 2 particles, replace CTFs with job 7 CTFs

Another way is to use the shell to separate the datasets in each class first, then do 4 import jobs, and 4 exposure group jobs, but that’s still 8 jobs and there’s no guarantee the shift groups will match up between the two 3D classes (since some may be missing in a class). You also can’t do this directly in cryoSPARC by only linking one set of micrographs at a time during import, because you get an error about micrographs with no particles.

I think @leetleyang is saying that one could instead set shift groups right away on the exposures at the beginning and then take care these are preserved through the import/export cycles. Then there are only 2 (or 3) exposure group jobs back then, and the 2 import jobs for the classes. That would have been wiser, I suppose.

Yes… AFIS with 60+ shift positions, for instance. Spreading those particles out quite thinly.

Without going the exposure route, I imagine another viable workflow for 2 classes may be:

Job 1: Import class1
Job 2: Import class2
Job 3: J1+J2grid group assignment (split outputs A & B)
Job 4: J3Aposition group assignment (assign A1,A2,...,An)
Job 5: J3Bposition group assignment (assign B1,B2,...,Bn)
Job 6: J4+J5 consensus refinement
Job 7: J6, J1Particle Sets (intersect)

I think the intersect and A_minus_B outputs should then correspond to class1 and class2, each with A(1-n),B(1-n) assignments and consistent CTF parameters. Or, replace the CTFs with J6 as before. Similar number of steps though.

Cheers,
Yang