I am a cryoEM beginner. I am processing a dataset from Krios (165kV - 0.735A, total exp. ~40). I had issues during motion correction where I had to omit last 5 images (37 down to 32) due to some problem that created dark lines along the edges. I have about ~11k exposures.
My protein has two similarly sized domains, one mostly helices and one mostly beta-sheets. What we put on grids was a tetramer (~250kDa).
This is my pipeline:
import volume (reconstructed from a dataset collected at Glacios from the same grid)
create template
template picker (5.4 mil particles)
inspect picks (1.2 mil particles)
extract mics (256 px → 96px)
2D class (200 classes, 40 EM iter, 400 batchsize, max over poses off)
select 2D (~230k particles)
2D class (50 classes, 80 EM iter, 400 batchsize, max over poses off)
Select 2D (~145k particles)
Ab initio (1class)
NU refine (~100k particles, 4.17A)
extract mics (no downsampling)
NU refine (~100k particles, 4.12A)
Global CTF ref (tilt, trefoil)
NU refine (~100k particles, 6.1A)
Hetero refine (2 volumes: 1 high resolution map from NU refine and 1 lowpass filtered 20A)
NU refine (~70k particles, 6.33A)
Local CTF ref (flat histogram with high tails)
NU refine (~70k particles, 6.6A)
I also tried Hetero refinement alternative (1.2 mil particles into 1 good and 2 bad volumes) instead of 2D class jobs and then NU, CTF, Hetero and NU refine. I ended up with ~300k particles and ~6A.
I also tried to do 3D classes instead of hetero refine, but ESS between the 2 classes ~ 1.2 which I believe is fairly low to indicate heterogeneity?
Then I did 2D class with the 1.2 mil particles with 80 EM iterations instead of 40. After 2 rounds I ended up with ~210k selected particles. Then NU, CTF, NU refine to get ~5A.
Maybe I was too strict with inspecting picks (?), but if I wasn’t and went through with ~4 mil particles, then the 2D would take days because I have a lot of junk in the 2Ds already.
What would you suggest I missed or what should I revisit to try and get a better resolution? Or do we need to collect more data? We collected on carbon on gold grids and had a bit thicker ice and something that looked like drops in the middle of squares maybe due to waiting too long to vitrobot the grids after plasma?
The problem also lies in the fact that the reconstructed map looks nothing like the model alphafold predicted (neither tetramer, nor monomer), so I can’t just fit it in the 6A map and be done. Also, de novo model building in a 6A map won’t do wonders (attaching a pic of the volume).
It is difficult to say exactly what to do based on what you’ve shared so far, but here are a few general suggestions:
Work on a subset of your data first - do you picking, classification, ab initio and initial refinements on 500-1000 micrographs before expanding to millions of particles. Make sure to discard bad micrographs in Manual Curation first.
Use the Micrograph Denoiser before picking and use the Junk Detector afterwards (before Inpect Picks).
Try using the Blob Picker instead of your imported template and see if you can generate templates from that.
Try a larger box size. 256 px * 0.735 Ă…/px seems small for a 250 kDa particle. If you have an expectation of the particle diameter, try starting at twice that - if not, start at 384+ px (F-crop to eg. ~3 Ă…/px).
If you didn’t already, try 2D classification settings closer to default before the modifications you tried here (fine to increase number of classes to 200). For some data the fast approach works just as well. Also try with a circular mask set the expected diameter + ~10%.
If a step reduces your resolution significantly (13. in the list above), take a step back and try something else.
If you want better feedback, consider sharing a bit more data - at least a representative micrograph, some 2D classes and perhaps info from Manual Curation such as Relative Ice Thickness, Defocus and CTF fit resolution plots.
yes, I think an expert could achieve a high resolution structure from these data.
are you sure 165kV? Next time use a lower magnification/larger pixel size more close to 1A if possible - you will get more particles in each image and still not hit nyquist likely.
I would not use an imported volume as a template for template picker. but now that you have used that method to achieve 2D classes, I would redo picking with 3 or 4 of those classes, distinct views if possible.
4-fold bin (256→64) should work well and be even faster for all jobs prior to NU-refine.
your NU-refine with CTF refinements is killing resolution so is clearly degrading instead of refining parameters. Skip it.
This is clearly symmetrical, why do you think there aren’t dimers/tetramers? That being said, do NOT use symmetry in your jobs until you’ve sorted out the basics. it will inflate your resolution, give pretty maps, and still be totally wrong.
Great picking, quality 2D (increase batchsize more important than increase o-EM), liberal 2D selection of all things that look like decent particles, redo 2D selection keeping ONLY ~100k best particles from a few different views and use these to run ab initio 1 class, redo ab initio 1 class with 10 particles limit in the parameters will finish in 1 minute, use the good ab initio once and the bad ab initio 4 times in het refine. extract the good class unbinned, NU-refine. 3D class with class similarity 0 and resolution limit 12 and number classes 10 and turn off convergence filter. NU-refine the single best class/es and then run local refine with mask around region of interest. I would highly suspect that this workflow maybe with tweaking will put you in a really good place for modeling.
Hi @boggild
thank you for suggestions. I used denoiser before picking. I will try now the bigger box. 2D with settings closer to default couldn’t separate the junk and I tried 200 classes. Even though CTF ref looks like it reduced resolution, the map from step 13. didn’t look like 4A and after CTF ref the FSC curve was smoother (before it was a bit bumpy/wavy).
yes, I am sure we used 165kV, it was recommended by our core facility.
I will try repicking with 3 classes and then extract with bigger box. I tried blob picker and there was almost the same ammount of particles picked.
As said above, it looks like CTF ref is killing resolution, but there really isn’t much difference in how the map looks like, meaning the 4A doesn’t look like 4A but more like 6A or worse.
Yes, I see it is symmetrical and I strongly believe it is C2. I am positive it is big enough to fit my 4 protomers. I just meant that the model alpfafold predicted does not fit in the map.
OK, people will start thinking I am a flexibility-detecting bot installed in the forum. I’d like to see a 3DVA job done on the largest clean set you can get (remove only the clearly junk from the 1.2 million set). Start by a homogeneous refinement job with that set downsampled to apix 1.5, not expecting good resolution, then run 3DVA without a mask, limit to 8 - 10 angstroms max resolution. Ask for 10 - 20 clusters and intermediates. Look at them moving in Chimera(X), then decide what to do next. You might mask one side for a local refinement then go again with 3DVA, broadly masking the opposite side. Decide what to do to get to higher resolution after that, if possible. Of course 3DFlex is also a good idea just to visualize the thing and take decisions. Maps from 3DFlex are not buildable but they can be used as baits in a hetero refinement or biased 3D Classification job.
Hi @Mrs.Smith , what is the distribution of defocus values ?
165kV (the voltage) sounds surprising as a value, especially knowing that 165kx (the magnification) is a common value on a Titan Krios. My guess : it is 300kV, mag 165kx. Maybe you can double check ?
Thank you for catching that @adesfosses. Indeed the value of 165 was mag in kx. Do I need to start the processing from scratch with importing images and setting voltage to 300?
Yes indeed you need to restart with 300kV, but you might be able to not loose all your particle selection work, by re-assigning your previously selected particles to the newly pre-processed micrographs.
I imported the data correctly. Did thorough 2D classes, Ab initio, Hetero refine, Homo refine, 3D class and Homo refine and ended up with only around 30k particles and res of around 6A with preferential orientations. Seems like we have flexibility + preferential orientation and maybe new dataset on graphene grids with somehow stabilized particles might be necessary to get to higher res.
thanks for the update. very surprising, I would expect this would be straightforward given your 2D classes. you certainly don’t have preferential orientation so make sure you are not selecting for a preferred view in the ab initio stage. Use that for models, but don’t use it to classify and omit particles. Agree though, sometimes a better dataset is all you need and it’s easier than processing data for months that can’t achieve high resolution.
graphene will severely concentrate your protein on the grid FYI, maybe10x. if it’s GO it will add contrast noise, whereas monolayer graphene (with special glow discharge needs) won’t.