Denovo Model Building

Hi everyone,

I’m currently working on solving a cryo-EM structure for a protein with no close homologous structures available. Models predicted from AlphaFold is not close to map. I tried using Map and sequence as input to few tools like DeepMainMast and DiffModeler, but they are not really helping. This protein is around 164kDa for monomer and 328kDa as a dimer By the resolved map, we can certainly say that it is dimer. I was wondering if anyone could suggest the best workflow, software/tools, or available resources/tutorials for tackling an unknown structure in this scenario. Any advice on map interpretation, sequence assignment, model building strategies, or refinement approaches would be greatly appreciated.

Thank you!

what is the resolution of the map? Can you see helices, sidechains…

From the sequence analysis, we know for sure that protein is beta-sheet rich EGF-like domains. Resolution at this point is close to 4.5 A. But I have to refine more to remove preferred conformation. Here is the snapshot of at high resolution region. I can build backbone with confidence, but for side chain it might little skeptical after C-beta.

1 Like

when you say models predicted by alphafold are not similar to the map, have you tried fitting to both the current & z-flipped maps? At this resolution the hand may not be obvious, particularly if you mostly have beta sheets

(you should at least be able to fit a model domain-wise if the hand is correct & EGF-like domains are strongly predicted from the sequence)

2 Likes

Model-Angelo is usually my first point of call. build_no_seq (so it doesn’t need a FASTA file) and try both hands (handednesses?).

Then BLAST or HMMer the output as a sanity check, even if you know the sequence.

We’ve had two cases in the last year of researchers bringing us samples which they say are protein “X” when actually they’re a stress protein the same size as their target. And you wouldn’t believe how hard it is to convince them they’ve got the wrong protein.

5 Likes

I generally agree, although I’m not sure how well modelangelo will do at 4.5Å… have you tried it in this res range? At lower res I would tend towards domain-wise fitting/rebuilding AF models

1 Like

I have used it around 4.4Ang. I wouldn’t trust residue assignments, but it does pretty well for backbone tracing, which is a quick way of checking handedness - even at >4Ang, the wrong handedness tends to fragment, at least with the three things I tried it on. With two of those, though, I was deliberately filtering down the map to decide how much I could/should trust the other build_no_seq prediction.

That map doesn’t really look 4.5Ang to me, it looks a bit better than that… although from the text that’s “the best bit”?

If in silico models are completely failing to fit, then I would do as many checks as possible, hence the build_no_seq with both hands, etc. Even if residue assignments aren’t great, if it’s an HSP (for example) it’ll probably be a close enough hit via BLAST…?

2 Likes

If you have the backbone already, have you tried FoldSeek to search?

Or you can try DomainFit with the entire proteome. https://www.cell.com/structure/fulltext/S0969-2126(24)00143-6

2 Likes

Another new one is CryoZeta, from the Kihara lab at Perdue:

2 Likes

I would be very interested in hearing if there are automatic building tools that work on this resolution range. We typically have little success with ModelAngelo at resolutions between 3.5-4 Ans, though all the proteins we are working with are membrane proteins. Once we get closer to 3 Ans it works nicely.

I usually stick with AF fitting at this resolution, there should be some well structured domain that AF predicts well that you could use to at least determine the handedness. The map looks quite good.

1 Like

This is my approach at this res too. These days I would maybe try Rocket or phenix.predict_and_build if there is a fit but large conformational changes are evident

Might be a bit outdated now, but before AlphaFold I had success using Buccaneer in CCP-EM https://journals.iucr.org/d/issues/2020/06/00/id5008/index.html. However, it’s mostly tested with structures at 4 A or better. But also, a quick mass spec of the protein sample would be helpful to check it’s definitely what you think it is :wink:

3 Likes

I’ve had good experiences with backbone generation using cryoatom ( CryoAtom improves model building for cryo-EM | Nature Structural & Molecular Biology ) in the resolution range >4Å. In my specific use cases, I experienced more complete backbone building with cryoatom compared to Modelangelo. Make sure to run on the zflipped map as well because assignment of map handedness can be tricky. Sidechain assignment will be hit and miss for sure - so you’ll need orthogonal approaches to identify the correct sequence.

3 Likes

Another vote for DomainFit at this resolution, but this would really only be if the protein is not what you think it is and does have a good model out there somewhere. We recently used it to identify a contaminant and were delighted at how successful it was. Otherwise, agree with Modelangelo

Thanks everyone for all the suggestions and ideas. We are confident about the protein identity and have characterized it extensively by SDS-PAGE, mass spec, Western blotting, and antibody staining against the protein.

I’ll go through the recommended tools/workflows. I also haven’t tried flipping the volume yet, so I’ll include that as one of the permutations to test as well. I’ll get back to you all with how we eventually solved it!

2 Likes