Dear all,
is it possible to use an AlphaFold predicted structure as templates for picking?
Best
Dario
Hi Dario,
You could use the molmap feature in ChimeraX to generate a volume of the alphafold structure (e.g. to 6 angstrom), save as an mrc file, import to cryosparc and then use the Create Templates job.
https://www.cgl.ucsf.edu/chimerax/docs/user/commands/molmap.html
Cheers,
Will
You technically could do this, but I would be very wary of template bias. Is there a reason why you can’t see your particles well enough to pick them either manually or with Blob Picker, and use the resulting picks to either template pick or train a neural network picker like Topaz?
After blob and template picking and 2D classif. my protein seems quite orientation biased. I thought that with proper templates from other orientations I could push/help the picking jobs to pick more of the other orientations (if there are any present)
It’s possible, but I would approach cautiously, and be aware of the possibility of template bias
Got it. But if there actually is a bias, I would see that in the following 2D classification
Not necessarily. (One of) the problem(s) with template bias is it is insidious. Not everything is as obvious as going hunting for pictures of Einstein in a micrograph. If you’re going to go hunting with PDB model projections, I would suggest going eyes-on every single micrograph to sanity check the picks, then continuing to be extremely cautious by running a second (or even more) round(s) of 2D classification, then re-picking with clearer classes from that and comparing picks.
Even do a second round of picking with a blob picker and compare pick locations between the two. If the only way to get good picks is use PDB templates, I get worried that what it’s picking isn’t real.
I tried that but got stuck at the create template job. The error I am getting is
“could not broadcast input array from shape (62,81,61) into shape (60,60,60)”
The volume that you are projecting needs to be a cube - if it isn’t, you’ll need to resample it in Chimera on a cubic grid first.
100% agree with the risk of model bias, and I’d like to add that Alphafold can produce really bad models sometimes. Check for instance Myosin VI (probably any myosin) from Alphafold and you’ll understand what I mean.
Agree, anything that comes after the Cam stabilized neck is anybodies guess. But the converter and neck alone are quite conserved and accurately predicted
Well those are crystal structures that it simply put together… anyways, I think this is a good example because Alphafold did not include the 2 light chains. So if you make a map out of that model, it will have a much thiner “neck” than the real protein… not to mention all the mess that Alphafold did with the helices, simply unreal
Can only reply to one person in one post and I’m not going to triple post. Sorry, Oli, you drew the short straw.
Check out EMD-6617; non-cubic box deposition…
Alphafold is pretty good for a lot of things. Early on, it had a terrible model for an apoferritin monomer on the EBI, which has since disappeared (“updated”? I don’t know.) Sadly I didn’t obsessively screenshot it while I was browsing. But seeing that always made me cautious with Alphafold models.
I was under the impression that Alphafold2 wasn’t strictly a homology modeller like SWISS-MODEL, I-TASSER, etc, but a “first principles” model which wasn’t as biased from x-ray crystal structures…? At least, that is the implication from the article.
I’ve spent quite some time debating with collaborators who play with Alphafold and think it is the bee’s knees… but for some proteins (in my experience, often viral, although I’ll admit I do study some unusual viruses) it is worse than useless for most proteins in complex virus capsids. On the other hand, I fed it the sequence of another protein I’ve recently been working on which has a deposited crystal structure from a (non-related) organism (but the protein has the same function and high similarity) and the model it predicted fit pretty well into the bias-free (blob picked, self-consistent ab initio, etc, etc) cryo-EM map I reconstructed. The deposited crystal structure didn’t fit well at all, as it had a significantly different structure in the complex interface and for several outlying regions (monomer core was reasonable, though).
Basically, if anyone starts relying on Alphafold2, you need to remember the maxim: “caveat emptor”.
edit: Clarified a point which could be ambiguous.
for viral proteins it is often (but not always) pretty terrible, agreed - not enough sequence depth in the MSAs
Exceptions being when viral proteins have conserved prokaryotic homologs (e.g. viral ion channels do pretty well)
Re non-cubic boxes, nothing wrong with them in principle, but CS will not deal with them
Oh it is more powerful than Swiss-Model in the sense that it builds stuff even when there is no structural information available, for sure. But that is exactly why we should be careful about Alphafold: it will always give an answer (a bit like ChatGPT). Not a problem for advised structural biologists, but it can be a pitfall for beginners, that is all I am saying.
Sorry, probably wrong forum, but could you help me how to do that?
My initial idea was simply to get an idea how my protein could look like in other orientations, as I am struggling with a strong bias. I have just one orientation in ok resolution, but nothing from the rest. I went through several picking and classification runs (getting close to job300… and no ab-initio) but cant get any other orientation resolved. Either there are truly no other orientations present or I just cant pick them. So I thought to use a prediction, which fits nicely on the one orientation I have, and create templates to get an idea of how the other views should look like
Making templates to get an idea of how other orientations look is a great idea - I do this all the time. Where I would be a bit more careful is actually using them for picking.
To resample on a cubic grid, look up the documentation for the vop
command in Chimera - you will need to use it to create a new (cubic) map, then resample your old map on the grid of that one.
Cheers
Oli
100% agreed - always look at the pLDDT values (and also consider different oligomerization states etc, and interdomain interactions). I would say that regions predicted with high confidence by alphafold are generally very accurate; but interdomain interactions can be a bit iffy, especially for proteins that are part of an obligate complex or higher order oligomer
@DarioSB Are you just trying 2D classification over and over again? You should move on and try ab initio with different numbers of classes (many, like 1-12) and see if you see anything. For template picking purposes even a very distorted map can actually be fine after appropriate filtering. They usually result from overrefinement of the bad directions, e.g. you have out to 4 Ă… in one direction and out to 12 Ă… in the bad directions. If you filter a distorted volume to 20 Ă… or manually limit alignment resolutions to > 12 Ă…, you can probably get a reasonable initial model.
Re: AlphaFold, it has the model complexity required to memorize the (non-redundant) PDB, so indeed for known structures it would be expected to recapitulate what is in the PDB. IMO it does pretty well in ~30 minutes what a good structural bioinformatics student would do in 30 days - but not particularly more than that.
I’ve experimented with using templates or initial models from the wrong protein, e.g. Kv1.2 for a TRP channel and it’s really not that easy to get the bogus structure back. However, it is critical to use a reasonably low resolution, which is much lower than you might think for molmap
. For example with open 5irx; molmap #1 12 gridspacing 1
you can still see helices and loops. A “resolution” of 20 - 40 is more appropriate for template picking.
It’s really best not to think of molmap as an EM map simulator at all, as it just places equal size Gaussians blobs on each atom. The box size problem you are having can be fixed by opening a map of the right size and using ongrid #X
at the end of the molmap command. You can also create a new custom grid to use with volume new
. The documentation is clear, it’s linked form the log so you can open it easily on the right page.
PS I bet if you carefully tune the blob picker you can get excellent results. You just set it to process 10 micrographs only (which is nearly instantaneous), and then expore all the parameters. In my experience people frequently give up after just a few attempts, but I can almost always find good parameters in < 30 minutes over 10 - 50 runs. “All” means “all” including using different kinds of blob shapes together and the number of local maxima to consider.