Hello,
The short answer is that there is no ideal workflow: you have to find the one that works best for each new dataset. This is inconvenient because there is no fool-proof protocol that one can follow with any guarantee to reach optimal results. But this is also what makes this type of data analysis fun to do.
You must have noticed that some steps are common to all workflows:
- the early preprocessing steps: motion correction, CTF estimation, particle extraction with Fourier cropping to speed up the initial classification jobs;
- the late refinement stages: re-extraction of the final set of particles with less or no Fourier cropping, homogeneous refinement with a combination of options adequate for the resolution range the dataset is reaching.
For everything in between, it is very difficult to come up with a standard workflow, and what you need to do depends a lot on the type and amount of heterogeneity present in the dataset. As emphasized in @olibclarke’s tutorial, you need to approach single-particle analysis with the mindset of understanding which different species are in the dataset. Getting good reconstructions naturally follows from properly sorting the heterogeneity.
Regarding this:
@Guillaume followed a slightly different /Alternative pipeline.
I want to emphasize now that I suggested this as a starting point, which is why the last step is deliberately phrased vaguely and open-ended.
Some more general advice:
The particle picking step is absolutely critical: avoiding false-positives in the first place will make the downstream classifications much easier than trying to classify these bad particles away. Everybody has their favorite particle picking strategy, and you will get different suggestions if you ask around. My opinion is that topaz produces excellent picks, if you are willing to invest the time of preparing a good training set by manually picking about 1000 particles. This forces you to look at the micrographs, which is good because then you have a much better feel for the overall quality of the dataset.
Doing the picking as best as you can deserves a big fraction of your time budget when analyzing a new dataset, because the initial set of particles will influence all the downstream jobs. I feel like picking is often overlooked, but it’s actually one of the steps that is most difficult to automate and most benefits from human expertise (especially when used to train a neural net picker like topaz).
Sorting the heterogeneity is the next most important part of the whole process. Like picking, there is no single strategy that will work consistently on different datasets. To get inspiration, it is often helpful to read the workflow figures (often found in supplementary figures) in articles reporting cryoEM structures of particles similar to the ones you are working on. This gives you a feel for what type of heterogeneity others have encountered with similar samples, and how they sorted it.
In this task, the most commonly used tools are 2D classification (class assignment and in-plane alignment of equivalent projections), heterogeneous refinement (class assignment and 3D alignment), 3D classification (class assignment in 3D without angular refinement). All these methods rely on comparing particle images, so if say you are trying to find a complex that is a minor population among an excess of the free form of the larger protein in the complex, then it might not work well because the free protein and the complex align well. In such a case, I could detect the minor population with cryoDRGN (which does not work by alignment). In another case, with a minor population of a particle looking very different and not aligning to the major species at all, 2D classification and heterogeneous refinement were very effective at isolating this population of particles while cryoDRGN could not detect it as it was too small a fraction of the total number of particles (neural networks are sensitive to large imbalances in their training set).
These are two examples to illustrate that you might need to try different things. Make hypotheses about the heterogeneity of your set of particles (on the basis of visual inspection of the micrographs after some manual picking, or of initial 2D class averages, or of initial 3D reconstructions), then see which method can test these hypotheses. This requires you to read a bit about each method, to have at least a big picture understanding of what they are designed to address.
Once you have one or several sets of particles homogeneous in composition, they might still contain conformational heterogeneity. This is where 3DVA and FlexRefine come into play. CryoDRGN also deals with conformational heterogeneity (jointly with compositional heterogeneity, so this can be a faster way to figure everything out sometimes).
I hope this kind of general advice is helpful.
Also, if you know any links, papers, or discussion threads about the workflow, please share them here.
For anything related to cryoSPARC, read the documentation. You will find concise descriptions of all job types (useful to learn in which situation to use which job), long-form tutorials and explanations of how to interpret job results, and links to the relevant papers.
For other programs, I think it is always beneficial to read the paper that introduced the method. The authors often describe the typical use cases, explain the limitations, and discuss which options to vary when applying their tools to different datasets.
Papers for the tools I mentioned in this message:
Good luck with your data analysis!