Can a map with overfitting artifacts be used as a reference?

cbeck · December 12, 2024, 10:31pm

Hi everyone,

I’m working on a protein complex made up of domains A and B. Both domains are internally rigid, but they’re quite flexible relative to each other. After running a NU-refinement with a mask around the whole complex, domain B refined to ~2.6 Å, but domain A ended up poorly resolved—it just looks like a cloud of dense dust.

To improve domain A, I tried a Local Refinement with a mask focused on that domain. The challenge was the poor starting reference for domain A. Even with a narrow search range for rotations and shifts (I tried both with and without Gaussian priors), the resulting maps consistently showed overfitting artifacts and a spuriously high GSFSC resolution, and I struggled with trying to fix these issues in downstream refinement jobs. That said, the density for domain A was still much stronger, and I could start to make out some rough secondary structure features.

I then wondered if I could redo the initial Local Refinement by using the original alignments from NU-Refinement, but this time using the improved map as the starting reference. I applied a lowpass filter to mask overfitting artifacts in the map. To my surprise, this approach worked really well! The features for domain A now match the nominal GSFSC resolution of 2.9 Å.

That said, I’m not sure if this workflow is entirely valid. Is this an acceptable way to approach the problem? And if so, is there a better method to perform a Local Refinement with a poor starting reference without having to use an overfit map? For example, I wonder if it’s possible to limit the resolution of the Local Refinement to ~5-6 Å to avoid overfitting artifacts, as subsequent iterations tend to degrade the map quality.

I’d appreciate any advice or suggestions—thanks!
cbeck

rbs_sci · December 13, 2024, 12:21am

Does your initial model show both domains more clearly? Could you mask out domain B in the initial model, recenter on A and heterogeneous or NU refine from there?

Another idea (although a bit more involved) might be pick the individual domains, 2D with a tight mask and classify as if they were separate proteins in the same prep. I’ve been experimenting with this idea myself recently… I can’t say it’s been hugely successful but you might have better luck with your sample!

cbeck · December 13, 2024, 4:42pm

Thank you for your response! The initial model only shows Domain B clearly. Unfortunately, any global refinement with a mask on Domain A performs very poorly. I’m not really sure why, since the mask on Domain A contains about ~180 kDa of mass, but I would guess that it has to do with how poor the starting reference is. I suppose I could subtract out Domain B and redo ab initio to get a better map for Domain A…

I also might’ve overstated the degree of flexibility between the two domains. As long as Domain B is aligned well, Domain A is also more or less aligned to the point where the overall envelope is visible, but it just looks like dense dust. I figured that in this scenario, a local refinement would be sufficient. And this approach definitely has worked, it’s just that I have to perform multiple iterations of local refinement in which I discard the alignments and only move forward with the map. This approach works even if the map looks like it has some overfitting artifacts, because the artifacts are lessened by the initial lowpass filter. I’ve just never seen this type of workflow in the papers I’ve read, so I wondered if there’s some critical flaw in my approach that I’m missing.

Your idea to pick each domain is really interesting though, and this is the second time I’ve heard someone suggest this approach! I’d really like to try it with another difficult dataset I’ve been working on in which the two domains are really flexible relative to each other.

Cheers,
cbeck