What is the general workflow (or sequence of jobs) in Cryosparc?

Hello.

I am new to this area and want to know the general workflow for solving the structures. There are various options in each job builder. For example, in 3D refinement, there are options like Homo, Hetero, and nonuniform. Different options in particle picking (manual/ blob vs. deep picking), flexible refinements, post-processing jobs, etc. As per my understanding, the jobs you perform will determine the structure like conformatioal changes, resolution etc.

I have been reading about workflow and came across the following:

  1. Get Started with CryoSPARC: Introductory Tutorial (v4.0+) | CryoSPARC Guide

  2. Homogeneous, Heterogeneous and Non-uniform refinement
    here @DanielAsarnow mentioned the basic pipeline:

Basic pipeline: Patch Motion > Patch CTF > Curate exposures > Pick particles > Inspect particles > Extract particles with 4-6x Fourier cropping (“binning”) > 2D classification > Discard bad classes > Ab initio with 1-8 volumes > Het. refine with same particles using ab initio volumes (good and bad) as references > Homo. refine / non-uniform of good classes > Re-extract with 2x cropping > Homo. refine / non-uniform. If resolution is near Nyquist, re-extract without any cropping, repeat homo / non-uniform. Try defocus refinement. If resolution improves use exposure group utilities to split groups and try global CTF refinement.

@Guillaume followed a slightly different /Alternative pipeline with different outcomes in the final structure.

Patch Motion>Patch CTF>Curate exposures>Pick particles>Inspect particles>Extract particles with Fourier cropping to 128 pixels >Run cryodrgn abinit_het 25 >Explore the results, discover what you might have missed from the several rounds of excluding particles in the basic pipeline

  1. The data processing tutorial from @olibclarke is indeed helpful.
    Data processing tutorial

I am working on hexameric protein to see the ligand / Inhibitor binding and associated conformational changes. Most of the earlier papers in this field do not mention the workflow.

Would you mind sharing your workflow knowledge (related opinion) and reasons (if possible) for setting up a particular job? It would be helpful (I think) for community members to expand their knowledge (by sharing).

Also, if you know any links, papers, or discussion threads about the workflow, please share them here. I appreciate any comments/help from the @team. Thanks

1 Like

Hello,

The short answer is that there is no ideal workflow: you have to find the one that works best for each new dataset. This is inconvenient because there is no fool-proof protocol that one can follow with any guarantee to reach optimal results. But this is also what makes this type of data analysis fun to do. :grinning:

You must have noticed that some steps are common to all workflows:

  • the early preprocessing steps: motion correction, CTF estimation, particle extraction with Fourier cropping to speed up the initial classification jobs;
  • the late refinement stages: re-extraction of the final set of particles with less or no Fourier cropping, homogeneous refinement with a combination of options adequate for the resolution range the dataset is reaching.

For everything in between, it is very difficult to come up with a standard workflow, and what you need to do depends a lot on the type and amount of heterogeneity present in the dataset. As emphasized in @olibclarke’s tutorial, you need to approach single-particle analysis with the mindset of understanding which different species are in the dataset. Getting good reconstructions naturally follows from properly sorting the heterogeneity.

Regarding this:

@Guillaume followed a slightly different /Alternative pipeline.

I want to emphasize now that I suggested this as a starting point, which is why the last step is deliberately phrased vaguely and open-ended.

Some more general advice:

The particle picking step is absolutely critical: avoiding false-positives in the first place will make the downstream classifications much easier than trying to classify these bad particles away. Everybody has their favorite particle picking strategy, and you will get different suggestions if you ask around. My opinion is that topaz produces excellent picks, if you are willing to invest the time of preparing a good training set by manually picking about 1000 particles. This forces you to look at the micrographs, which is good because then you have a much better feel for the overall quality of the dataset.
Doing the picking as best as you can deserves a big fraction of your time budget when analyzing a new dataset, because the initial set of particles will influence all the downstream jobs. I feel like picking is often overlooked, but it’s actually one of the steps that is most difficult to automate and most benefits from human expertise (especially when used to train a neural net picker like topaz).

Sorting the heterogeneity is the next most important part of the whole process. Like picking, there is no single strategy that will work consistently on different datasets. To get inspiration, it is often helpful to read the workflow figures (often found in supplementary figures) in articles reporting cryoEM structures of particles similar to the ones you are working on. This gives you a feel for what type of heterogeneity others have encountered with similar samples, and how they sorted it.
In this task, the most commonly used tools are 2D classification (class assignment and in-plane alignment of equivalent projections), heterogeneous refinement (class assignment and 3D alignment), 3D classification (class assignment in 3D without angular refinement). All these methods rely on comparing particle images, so if say you are trying to find a complex that is a minor population among an excess of the free form of the larger protein in the complex, then it might not work well because the free protein and the complex align well. In such a case, I could detect the minor population with cryoDRGN (which does not work by alignment). In another case, with a minor population of a particle looking very different and not aligning to the major species at all, 2D classification and heterogeneous refinement were very effective at isolating this population of particles while cryoDRGN could not detect it as it was too small a fraction of the total number of particles (neural networks are sensitive to large imbalances in their training set).
These are two examples to illustrate that you might need to try different things. Make hypotheses about the heterogeneity of your set of particles (on the basis of visual inspection of the micrographs after some manual picking, or of initial 2D class averages, or of initial 3D reconstructions), then see which method can test these hypotheses. This requires you to read a bit about each method, to have at least a big picture understanding of what they are designed to address.

Once you have one or several sets of particles homogeneous in composition, they might still contain conformational heterogeneity. This is where 3DVA and FlexRefine come into play. CryoDRGN also deals with conformational heterogeneity (jointly with compositional heterogeneity, so this can be a faster way to figure everything out sometimes).

I hope this kind of general advice is helpful.

Also, if you know any links, papers, or discussion threads about the workflow, please share them here.

For anything related to cryoSPARC, read the documentation. You will find concise descriptions of all job types (useful to learn in which situation to use which job), long-form tutorials and explanations of how to interpret job results, and links to the relevant papers.

For other programs, I think it is always beneficial to read the paper that introduced the method. The authors often describe the typical use cases, explain the limitations, and discuss which options to vary when applying their tools to different datasets.
Papers for the tools I mentioned in this message:

Good luck with your data analysis!

9 Likes

Thank you so much @Guillaume for the detailed explanation. Grateful for your insights.

100% co-signed on all of @Guillaume’s comments! The only thing I would add, if your complex is apparently symmetric (e.g. in your case perhaps C6 or D3, depending?), always explicitly consider deviations from the nominal symmetry, meaning:

  1. Calculate a reconstruction in C1. Inspect it. Does it show the symmetry you expect? If not, why not - can you explain it?

  2. After calculating a consensus with the apparent symmetry enforced, try symmetry expansion and classification without alignments. This can help untangle heterogeneous symmetry/pseudosymmetry mixes, if present (initially just use a global mask, but later try with a mask perhaps around one subunit, or two adjacent subunits).

  3. In parallel to 2, try symmetry-relaxed refinement, which can help resolve pseudosymmetry (if the consensus is in fact pseudosymmetric).

Cheers
Oli

3 Likes

Hello @olibclarke
Many thanks for your valuable suggestions and questions. Currently, I am processing data and trying to understand more about the abovementioned aspects. I hope to revert soon with more answers and questions.

1 Like