I am a cryoEM beginner. I am processing a dataset from Krios (165kV - 0.735A, total exp. ~40). I had issues during motion correction where I had to omit last 5 images (37 down to 32) due to some problem that created dark lines along the edges. I have about ~11k exposures.
My protein has two similarly sized domains, one mostly helices and one mostly beta-sheets. What we put on grids was a tetramer (~250kDa).
This is my pipeline:
import volume (reconstructed from a dataset collected at Glacios from the same grid)
create template
template picker (5.4 mil particles)
inspect picks (1.2 mil particles)
extract mics (256 px → 96px)
2D class (200 classes, 40 EM iter, 400 batchsize, max over poses off)
select 2D (~230k particles)
2D class (50 classes, 80 EM iter, 400 batchsize, max over poses off)
Select 2D (~145k particles)
Ab initio (1class)
NU refine (~100k particles, 4.17A)
extract mics (no downsampling)
NU refine (~100k particles, 4.12A)
Global CTF ref (tilt, trefoil)
NU refine (~100k particles, 6.1A)
Hetero refine (2 volumes: 1 high resolution map from NU refine and 1 lowpass filtered 20A)
NU refine (~70k particles, 6.33A)
Local CTF ref (flat histogram with high tails)
NU refine (~70k particles, 6.6A)
I also tried Hetero refinement alternative (1.2 mil particles into 1 good and 2 bad volumes) instead of 2D class jobs and then NU, CTF, Hetero and NU refine. I ended up with ~300k particles and ~6A.
I also tried to do 3D classes instead of hetero refine, but ESS between the 2 classes ~ 1.2 which I believe is fairly low to indicate heterogeneity?
Then I did 2D class with the 1.2 mil particles with 80 EM iterations instead of 40. After 2 rounds I ended up with ~210k selected particles. Then NU, CTF, NU refine to get ~5A.
Maybe I was too strict with inspecting picks (?), but if I wasn’t and went through with ~4 mil particles, then the 2D would take days because I have a lot of junk in the 2Ds already.
What would you suggest I missed or what should I revisit to try and get a better resolution? Or do we need to collect more data? We collected on carbon on gold grids and had a bit thicker ice and something that looked like drops in the middle of squares maybe due to waiting too long to vitrobot the grids after plasma?
The problem also lies in the fact that the reconstructed map looks nothing like the model alphafold predicted (neither tetramer, nor monomer), so I can’t just fit it in the 6A map and be done. Also, de novo model building in a 6A map won’t do wonders (attaching a pic of the volume).
It is difficult to say exactly what to do based on what you’ve shared so far, but here are a few general suggestions:
Work on a subset of your data first - do you picking, classification, ab initio and initial refinements on 500-1000 micrographs before expanding to millions of particles. Make sure to discard bad micrographs in Manual Curation first.
Use the Micrograph Denoiser before picking and use the Junk Detector afterwards (before Inpect Picks).
Try using the Blob Picker instead of your imported template and see if you can generate templates from that.
Try a larger box size. 256 px * 0.735 Å/px seems small for a 250 kDa particle. If you have an expectation of the particle diameter, try starting at twice that - if not, start at 384+ px (F-crop to eg. ~3 Å/px).
If you didn’t already, try 2D classification settings closer to default before the modifications you tried here (fine to increase number of classes to 200). For some data the fast approach works just as well. Also try with a circular mask set the expected diameter + ~10%.
If a step reduces your resolution significantly (13. in the list above), take a step back and try something else.
If you want better feedback, consider sharing a bit more data - at least a representative micrograph, some 2D classes and perhaps info from Manual Curation such as Relative Ice Thickness, Defocus and CTF fit resolution plots.
yes, I think an expert could achieve a high resolution structure from these data.
are you sure 165kV? Next time use a lower magnification/larger pixel size more close to 1A if possible - you will get more particles in each image and still not hit nyquist likely.
I would not use an imported volume as a template for template picker. but now that you have used that method to achieve 2D classes, I would redo picking with 3 or 4 of those classes, distinct views if possible.
4-fold bin (256→64) should work well and be even faster for all jobs prior to NU-refine.
your NU-refine with CTF refinements is killing resolution so is clearly degrading instead of refining parameters. Skip it.
This is clearly symmetrical, why do you think there aren’t dimers/tetramers? That being said, do NOT use symmetry in your jobs until you’ve sorted out the basics. it will inflate your resolution, give pretty maps, and still be totally wrong.
Great picking, quality 2D (increase batchsize more important than increase o-EM), liberal 2D selection of all things that look like decent particles, redo 2D selection keeping ONLY ~100k best particles from a few different views and use these to run ab initio 1 class, redo ab initio 1 class with 10 particles limit in the parameters will finish in 1 minute, use the good ab initio once and the bad ab initio 4 times in het refine. extract the good class unbinned, NU-refine. 3D class with class similarity 0 and resolution limit 12 and number classes 10 and turn off convergence filter. NU-refine the single best class/es and then run local refine with mask around region of interest. I would highly suspect that this workflow maybe with tweaking will put you in a really good place for modeling.