Small protein, 3D reconstruction does not work

Hi CS users,

I have seen many great advices from this forum but I could not find an exact solution to my problem, I am asking for helps…! This is a long story, but I thought it will be better to share details.
I am allowed to upload just one media, so I attach one very long picture with labels. (You may have to browse the picture in a separate window, sorry for the inconvenience)

I am trying to get a structural model of a relatively small protein, which is about 90 kDa.
The protein I am actually dealing with is even smaller (<50 kDa) so a Fab is used to make it bigger, to ~90 kDa.

The data were collected using Krios 300 kV with Gatan K3 detector.
Pixel value is 0.664 A/pix and 24,000 movies were collected.
From this data the processing procedure was as follows:
(Parameters were set to default, otherwise stated)

  1. Full frame motion corrected and patch CTF carried out
  2. Blob picker
  3. 2D classification to get templates
  4. Template based particle picking was done and initial particle number was 11.7 M particles
  5. Several rounds of 2D classification were carried out and left with 3.4 M particles (showing only classes of different views) F1

I am aware of the orientation bias, where the side view is not exactly the side but just a bit of angle around from the front view. To solve this problem, I prepared grid in different way and found right condition where I could get this F2. However this is from screening and I am planning to get data for high resolution processing later.

However, another problem comes later with this current data.
I will continue with my procedure:

  1. From 3.4 M particles I made a subset with 1 M particles and I ran ab-initio reconstruction and I get these models F3.

I can see clearly that class 1 seems as my model (891,000 particles) F4.

So I tried next steps:

  1. Heterogenous/Homogenous/Non-uniform Refinements
    The NU-Refinement showed the best results but I could see streaks (which may have come from the missing views) and the model is no where near as a real structural model. F5 and F6.

Apart from the problem probably caused from the missing view, when I checked what particles were used to reconstruct the model above I got this F7.

Out of 891 K particles, only about 10 K particles are the top view. Also, these top view like particles look so bad compared to the top views I saw with 2D.

Then I could deduce three conclusions:

  1. The top view I thought is not actually the top view - which I think is very unlikely because the secondary features I saw from 2D are true.
  2. Top view could not align because there is missing side view - this also seems not so likely because with the screen data where I got the new side view, it still shows ab-inito reconstruction in similar outcome and particles were also checked after 3D reconstruction and showed a very similar results - very little number of top views.
  3. Maybe, and hopefully it is area of processing where there is some parameters I could change with to actually let top view to fit into rest of the views.
    (Plus, I ran many more rounds of 2D classification to really clean up the particles, where I ended up with about 850 K particles - which showed the same results as above)
    (Also, I tried manually fixing the ratio of views to 1:1 but it gave a very weird model and when I do ab-initio reconstruction with this ratio adjusted particles and one class only, the half of the particles go to “unused particle” which is probably the top view)

I will appreciate a lot if there are any tips or answers to this problem…!

Thank you

Seong

looks pretty amazing so far. Select 2D classes all those which are NOT side view, then re-run 2D classification 100 classes. Can get rid of “bad” top views this way but I suspect these will be mostly good top views that are bright white here bc they forced all top views into only 3 classes. THEN select 2D from this ~5000 particles worth of classes which represent several distinct views (no side views selected). You could really benefit from tilted top view as well, which there is evidence of.

Topaz (or template) pick all micrographs again with these 5000 particles representing top and top-tilt views. you will find 100,000 more of them. Keep cleaning up in 2D (this will also remove duplicates) and try reconstructions from increasing percentages of non-side views (10% top? 15% top? 20? -ok to omit some side-view classes).

DeepEMhancer. you will be surprised.

1 Like

Thanks a lot for the advices!
I immediately began with your suggestions. Later I will share the results.
However, regarding DeepEMhancer, I have a question.
I am not familiar with DeepEMhancer, but from what I read from, it seems like it is more of a sharpening tool. But my model does not really show any real structural features (secondary feature etc., if you refer to F5 of the picture in the first post). Would DeepEMhancer helpful to make the model to be real?

It will make F5 look much more like a real structure. For better or worse, it masks a lot of issues and gives a pretty structure. This can help with modeling and final figure but hampers observation and attendance to issues of flexibility or anisotropy etc. so we use it all the time but focus on improving regularly sharpened maps.

I ran the DeepEMhancer, and these are what they look like.
I have not tested the parameters, they were all set to default.
Although they seem better than before (F5), unfortunately the maps still look not so real… If I fit an alphafold predicted model (which I know is quite correct) to the map, I really cannot see the fitting.
The other suggestion you gave me, I am still working on it.
Do you have any tips on using deepEMhancer? But as you have mentioned, I will have to improve other aspects first. Thank you!

Hey,

I am also working on rather small proteins (60-150kDa) with preferential orientation, one thing that I noticed was that the streaking can be also caused by insufficient classification and thus a lot of heterogeneity in your dataset. I was usually more successful when I properly classified the dataset using 3D classification. However, 3D classification on small particles can be quite challenging and usually required me to play a lot around with the parameters. Sometimes the cryoSPARC 3D classification was also not “good” enough, but lately I was then successful by using relion 5 3D classification with blush. Usually the reconstructions of a small well classified (20,000-30,000) particles set looked much better.

1 Like

No tricks for deepEM, looks like you did it right. They look better. Sure it’s not z-flip? Fab is obvious here. Volume tools, flip hand.

Hi!
Thanks for the suggestions. Could you maybe share what parameters you changed? Probably those parameters affect differently protein by protein, but to get some ideas please. I first thought that the problem of previous data was really because of the sample itself. But I am starting to think that processing could solve this problem.

Yes! They look actually better than the previous models, thank you for letting me know this great tool :). I tried z-flip, but I cannot really tell whether it betters the model or not. It is quite frustrating- with 2D results, I thought it won’t be so hard to get the map. One thing I noticed was that both Fab and my protein are quite unstable in terms of structural conformation. For example, as you know Fab’s constant region is not really held rigidly to the fv region. So I am also trying local refinements to focus on the rigid parts (fv+top of my protein).

Oh… I am beginning to think it was actually z-flipped as you @CryoEM2 mentioned.
If you see a real Fab (below), at this view, you see that in the constant region, right arm is before the left arm (towards you) - this is a feature to check correct left and right side.
With F5, it is not so clear of this feature, but the ab-initio map, F4 clearly shows that it is flipped in wrong way.

Hey,
Yes sure, usually I run multiple round of heterogenous refinement against a trash class to initially sort of “trash” classes which were not identified in 2D classification. I noticed that hetero refinement works in my hands better when hard classification is turned on and the batch size is increased to 6000-10000 classes. It might be that you also have to run multiple hetero refinements using the good particles from the previous job until you notice that not so many particles switch in the trash class anymore. Usually next I do a NU-refinement or homogenous reconstruction and based on this 3D classification. I found that for me forcing hard classification works better. Besides this I can recommend initially playing around with number of classes (this however really depends on your data), I usually start with 5 and 10 classes and the resolution (for small proteins surprisingly lower resolutions around 3-4A worked better for me). As Initialization mode I use simple. If this does not work you can start to play around with the O-EM batch size and learning rate.

Yes, the alpha helix at the top is an obvious feature in map and model. One of two possible orientations about Z will fit, and then the POI should fit as well.

Take large NU-refine of many particles, run 3D class resolution 8, 30 classes(assuming 1mil particles), class similarity 0, output results each f-EM, don’t resort based on size. This should give you at least a few nice models to refine and use as reference for future jobs like het refine or Create Templates for picking rare views

Wow thanks! This is a quick refinement with z-corrected model as reference- it looks much better than F5. It seems like, as you @CryoEM2 and @OleUns have mentioned, I still need to get rid of more particles. Thank you a lot for the great tips and thanks for sharing detailed parameters!! It will take a while to get results, plus I still think that some orientation has to be seen too (F2), so I am collecting more data with that new grid this week. Please stick with me for sometime! I will continue sharing the results!

1 Like