Feature lost during refinement

diffracteD · February 3, 2021, 9:26pm

Hi,
I’m working with a transmembrane protein in CS. With 0.8 million particles I could finally see clear transmembrane features in the ab-initio model. After that when I use heterogenous refinement with default parameters I loose almost all the transmembrane and the map looks like with no feature at all compared to the nice looking model.
I had to set maximum align resolution to 4A (0.8 apix) to get some transmembrane, with default it was not giving much transmembrane. With max align res set to 4A, the transmembrane got really clear in the initial model.
Can you please suggest what mistake I’m making during refinement leading to complete loss of features.

Thank you.

mmclean · February 3, 2021, 10:51pm

Hi @diffracteD,

Did you run Ab-initio Reconstruction with multiple classes, or a single class? In either case, you could try taking each ab-initio class and its associated particle stack, and pass these each into separate Non-Uniform Refinements, instead of using heterogeneous refinement. If the differences between the good ab-initio classes are relatively small, you could also try running a non-uniform refinement with all of the particles together, which should ideally give you an average between whatever conformations are present in the data. We generally recommend Non-uniform refinement for cases like these, as it is specifically designed to help improve results with membrane proteins, and to prevent the disorder present in the specimen from hindering the alignment process.

Best,
Michael

diffracteD · February 3, 2021, 11:18pm

Hi @mmclean,
Thank you for the suggestion. I’ll definitely try it.
To answer your question, I did multiple rounds of ab-initio models to sort out a map with maximum amount of features. All the time, I used more than 3 different model since the time I started to get similar initial models (that way I got sure I got rid of most bad particles).
I got the feature loss problem with both heterogeneous and homogeneous refinement with default parameters. Right now I’m doing a heterogeneous refinement between 2 maps using max alignment resolution 3A. Using max resolution or max align resolution helped he to get a good model so I’m hoping to see similar effect in refinement too.
As per your suggestion I’m also trying a non-uniform refinement. Also please suggest what are parameter need manual adjustment for better alignment of membrane regions.
Thank you.

mmclean · February 4, 2021, 2:53pm

Hi @diffracteD,

Thanks for the info. Would you say that the protein is on the smaller side? Sometimes with small proteins and very low SNR data, refinements may encounter difficult in aligning regardless. For non-uniform refinement, you can try decreasing the initial lowpass resolution value to something like 15 - 20 Å, rather than the default of 30 Å. If you’ve already tried the non-uniform refinement with default parameters and want to continue experimenting, you can also try changing the Adaptive Window Factor (AWF), the Non-uniform filter order, and enabling Adaptive Marginalization. All parameters are described a bit more in the guide page on the job, and in the corresponding Nature Methods publication that the guide page links out to.

In short, the adaptive window factor controls the tradeoff between the accuracy of the local cross-validation procedure, and how rapidly the resolution is expected to vary over space. The non-uniform filter order gives the order of the lowpass filter used in the regularization, with lower orders giving a more smooth filter in frequency space. Adaptive marginalization helps to decrease overfitting when the SNR in the data is low.

Best,
Michael

diffracteD · February 7, 2021, 10:13am

Hi @mmclean,
Thank you for some valuable suggestions. I’m working with a 300 kD protein with a very big transembrane part, so very flexible.
Using 3A as max alignment resolution I could still obtain a well featured map with lots of scattered densities due to micelles. However, a NU refinement is still making most of them disappear and ends up making the map blunt (feature compromised and sometimes generates streaky densities).
NU refinement is definitely better than homogenous refinement as it still can refine some very rigid alpha helices belonging to the centre of the transmembrane, however, peripheral helices vanishes even after NU ref.
Tried the AM and other parameter changes but also this paper (https://www.nature.com/articles/s41592-020-00990-8 showed that these parameters does not affect NU refinement that much.
Please let me know what you think.
Thanks.

diffracteD · February 9, 2021, 5:52am

Hi @mmclean,
So far, I could get 4.6A map (with smooth FSC convergence) using NU refinement (with 5A lowpass filter the ab initio model)>particle subtraction (using NU ref data) > NU ref > symmetry expansion to C4 > local Refinement (with default fulcrum).
In chimera, at lower voxels the transmembranes looks complete but I can’t seem to get rid of very strong micellar densities surrounding the region (image link: https://drive.google.com/drive/folders/1AjDgAThnn6mpBXe2wGf_AHbOR2FCoHqs?usp=sharing).
Is there any way to subtract/mask densities like these ?
Please suggest.

mmclean · February 9, 2021, 3:24pm

Hi @diffracteD,

Those streaky densities most often occur when the mask used during refinement is too tight, and they definitely indicate overfitting. Sometimes it can happen because of poor subtraction (see below). Do you only see streaking with local refinements, or do you see it with the Non-Uniform refinement (NEW) too? Also, did your non-uniform refinements run with a dynamic mask? If so, you could try increasing the “Dynamic mask far” parameter beyond the default of 14 Å, maybe even all the way to 20 Å or more, to make the mask very soft.

Local refinement & particle subtraction is not designed optimally to deal with membrane proteins, since the micelle is actually different in each image due to disorder. This means that subtracting it away will almost certainly leave residual noise in each image, and depending on the tightness of the mask used for subtraction, this noise can be correlated from image to image thus breaking the image formation model. This is likely why the streaking happens, since the FSCs become inflated and the refinement can no longer discern reliable signal from noise. To improve subtraction results, the mask used for subtraction should be very soft (with a large padding of at least 10 pixels or so), and the mask used for the local refinement should also be soft too. You should also set the Mask parameter to static rather than dynamic (if you are using the Local Refinement (NEW) job in v3.1.0, static is the default, so you won’t need to change it). Finally, if you are on v3.1.0, you might find improved results using the Local Refinement (NEW) job instead of the Local Refinement Legacy job – the new job incorporates non-uniform regularization and marginalization just like the standard non-uniform refinement job. I’ve included a link to the guide page on the job here.

Good luck in processing,
Michael

diffracteD · February 10, 2021, 8:51am

Hi @mmclean,

Till ab-initio model the map was good, free from these streaky densities with nice and good transmembranes. After ab-initio, when the best model was subjected to NU refinement, these densities start to appear and with really strong density threshold. I’m not even thinking for particle subtraction or local refinement with this kind of atrifactual densities at the moment. I was working with a different membrane protein in relion, and never seen these kind of problem appear after first refinement trial.
I’m getting those overfitting artifacts with default parameters, symmetry applied (C1/C4), mask relaxed (dynamic mask far=30A) and every other possible way suggested for NU refinement. I never use static mask in CS, I really liked the dynamic masking feature of CS. If it helps I can share some map snapshots but it looks similar to the previous images I shared, no matter what parameter I change.
I also tried CSv.3.1 NU Refinement (NEW) and Legacy. Both gave me similar results.
Please suggest if I’m missing something here.
Thanks.

mmclean · February 11, 2021, 10:19pm

Hi @diffracteD,

Based on the issues you are encountering, we think it would be very helpful to use this data as a test case. I will reach out to you via email to ask about possibly confidentially sharing data with us.

Best,
Michael

diffracteD · February 12, 2021, 3:46am

Hi @mmclean,
I got some improvement over the overfitting artifacts after running a couple of rounds of heterogenous refinement with 20A low pass filtering the model. Also I did the heterogenous refinements in C1 and then moved to C4 when the dataset is really clean.
I believe, doing a 6A low pass of the model and going to NU refinement directly with C4 was forcing micelle-noise to align more faster and strongly which resulted in a falsified resolution and hence artifacts.
With current approach I got 7A so far with no overfitting artifacts like before. I’m trying particle subtraction with the mask generated in NU refinement and did another NU refinement using those signal subtracted particles to reach 6A resolution map. There are still some micellar densities, however, way better than previous one (not streaky). I have not tried local refinement yet.

Thank you so much for your valuable suggestions. It was really helpful to understand better.

kaurg5 · February 2, 2022, 12:44am

Hi,
I am working on a small protein and encounter similar issues related to feature loss during refinement. My ab initio model looks pretty decent but when I start to refine (homo or heterogenous) it, I start seeing streaky maps. Only NU-refinement generated a relatively better map. Can you please let me know what approach helped you overcome this issue @diffracteD @mmclean ?
Thanks.

diffracteD · February 3, 2022, 7:54pm

Hi @kaurg5,
I still have the streaky map issue. But in my case using 4Å initial res during ab-initio helped me to get a decent map.
Streaky map is coming from more noise fitting I believe. My guess is running more ab-initio to separate bad stuffs might help, which is also my plan right now.
Best.

mmclean · February 10, 2022, 1:08am

Hi @kaurg5,

What you’re noticing aligns a lot with what we’ve noticed when processing smaller proteins. Non-uniform regularization together with the adaptive marginalization done by the job helps account for uncertainty when there is limited signal in the dataset. You may want to experiment with the regularization settings; for example, since the protein is small, a larger “adaptive window factor” might help increase the accuracy of the local cross-validation by incorporating more spatial information, and thus reduce the chances of overfitting.

Of course, there are all the other standard tricks that may help – if you suspect there is still broken/junk particles in the stack (these contribute significantly to overfitting since they don’t contain structure) further particle curation/junk filtering via 3D classification or multi-class ab-initio reconstruction can help remove those. For achieving high-resolutions on a small protein, global and local CTF refinement as well as local motion correction may both be worth a shot, although the former requires fairly high resolutions to begin with (at least 4-3.5 Å ideally).

Best,
Michael