Map fragmentation in Homogeneous Refinement

Map fragmentation in Homogeneous Refinement

Hi all,

I have several datasets collected on Krioses (how do you spell the plural of Krios?) to study a protein dimer in complex with another much smaller and more flexible protein (functionally, can be thought of as a peptide ligand). The main protein dimer is ~110kD and much longer than it is wide, so ~150A across the long way and maybe 40A ish across the short way.

Despite there being a model already available of the protein in the apo state and the data being collected on advanced hardware (Falcon 4i + energy filter), the data have been challenging to process. Some elements of the context:

  1. There is a very strongly preferred orientation (thankfully, with the larger diameter pointed upward, displaying the more distinct, larger and more pick-able shape in the viewer’s direction). We have attempted to overcome this issue by collecting data at angles 0, 15, 30, and 40 degrees.

  2. There are two distinct arrangements of the dimer in the dataset–one much less common than the other–but both basically invisible by eye when looking at the micrographs (i.e. impossible for me to pick reliably by hand, especially in any of the rare non-preferred views). So, low SNR.

  3. The goal I’m shooting for at the moment is moderate resolution in the ~4A range to be able to distinguish the binding site of the peptide ligand, which is an alpha-helical region of a flexible protein. The region of the ligand in contact with my protein is probably ~4-6kDa.

  4. The particles I’m working with at the moment are generally binned by a factor of 4 for a pixel size of ~2.9 and a Nyquist limit of just under ~6A. This is to reduce computation requirements for as long as possible, as I’m running everything locally on a machine with two nvidia RTX2080 GPUs and on a bit of a time crunch. Also I’m running cryosparc version 4.3.1. I know it would be better probably to update cryosparc but I can’t currently risk the program not working due to some unforeseen incompatibility.

I’ve used a combination of template picking and topaz to get particles and address some of the map anisotropy, and I sort the particles with a mix of 2D classification, ab initio, and heterogeneous refinement. Now I have what I think are quite decent 2D classes with visible secondary structure and Ab initio volumes, so I’ve just run Homogeneous refinement to get a better sense of what the resolution is and what features are visible. The result in the case of both arrangements of the dimer, both with and without a static mask (so 4 different jobs), is a very badly fragmented volume that essentially looks like noise and artefacts in the general shape of my complex, not actual continuous protein features. Also, the density that I suspected was the bound ligand is not really there anymore, and has been replaced by a rosette feature that I understand to be an artefact. The attached volume was made with over 220k particles as the input and no static mask, and all the parameters for the job were kept as the defaults.

How can I improve this map/what did I do incorrectly? I’d be grateful for any advice.

Hi akw,

Κριοί (Krioi) according to Copilot. Greek-speaking people feel free to correct.

To me it really looks like a flexibility issue. Alignements fail because different parts of the protein require different translations and rotations of the entire particle image, so it’ll never gonna work like this. You can try 3DVA and 3DFlex but your current 200 k ptcls set might be too small. Actually, flexibility probably already sabotaged your selection steps, especially if you relied on 2D classifications. You can find the entire workflows for flexible proteins (ā€œdealing with conformational heterogeneityā€ or similar keywords) in recent Youtube videos by the CryoSparc crew, they are really well done.

2 Likes

The 3D looks terribly overfitted. As you say yourself, noise and artefacts. :frowning: Not knowing what the 2D (and mics) look like makes a lot of recommendations either more general or guesswork.

The first problem to fix is the preferred orientation. What have you tried to overcome that? A 30° stage tilt is the quickest option if grid prep/biochemical optimisation has excessive resistance… I do understand when you have [dataset] and there is zero chance for another go at collection (been through it myself) but if you can at least minimise the preferred orientation you’ll likely find that (a) processing is easier and (b) you get a better result at the end of it all. :wink: Even just an extra thousand mics with a variety of views can help.

I’d give skipping 2D entirely a go - with a well-thresholded pick set (so as little ice as possible) move straight to ab initio with a larger number of classes, larger number of iterations and a high class similarity. It will take quite some time, but might tease out some promising structures. Then take the lot into heterogeneous refinement (greatly increase number of iterations here) and see.

Your point 4 makes a lot of options troublesome, as they will take time, and ā€œtime crunchā€ on a difficult dataset is a recipe for disaster. :scream:

1 Like

Thanks rbs, I’m attaching a pic of the 2D class I did to check what I had before running this job. It’s 100 classes so the image might be comically large. But the ones near the top seem to be decent to me, and again everything’s binned so pixel size is about 3A. Also attached is the ab initio that I fed into the homogeneous refinement. Just as a sanity check–looking at these I see good 2D and kinda mediocre ab initio? Since posting I’ve been trying a bunch of other things and reading case studies. I think that homogeneous refinement was likely not the right choice of job, but I still don’t understand how it came out THAT bad when clearly a lot of particles are that general shape.

(rotate 180 about vertical axis–one side lower resolution than the other)

Thanks carlos, I am trying these/working toward trying these suggestions now. Getting more particles and will focus more on jobs that are made to sort through heterogeneity. I was thinking that the heterogeneity in my dataset would be more along the lines of ligand bound/not bound but I think there’s more going on than that (clearly, at least with the two different dimer arrangements, one of which is not even shown here).

I do generally sort out particles in 3D, with het refine and 5 or 6 different starting maps that represent a spread of previously obtained volumes and junk classes. A colleague here also suggested running ab initio 3+ times in parallel (same particles, not iteratively), each time with 5 classes, allowing the random seed to change each time, then picking the most promising ab initios from those and moving into het refine. Those sets of ab initio jobs are running now.

Some people don’t agree with using ab initio to sort particles so I’ll be trying this and other methods. I’m not sure I’ll have time to try this, but I am curious about the outcome of 3x5class ab initio jobs vs 15x1class ab initio jobs. I guess it would only make sense to try the latter if you were sure you had basically no junk in the particle stack.

The 2D is a lot nicer than I expected from your description. :smiley: Obviously a lot of heterogeneity in there - looks like very densely packed particles too.

The 3D model you show looks OK for an ab initio. I’d dump out the bottom three rows of classes (select all the others) and push with heterogeneous refinement (maybe six copies of the map as starting refs?) with a higher iteration count (maybe 15-20 and 5 full?)… at least for now.

2 Likes

Ab initio → hetero might work indeed. I just find 3DVA and 3DFlex more informative provided that the initial alignment is good enough. In some cases they can even tell you about functioning of the protein itself. Also, you have to keep in mind that ab initio gives all freedom for each particle to rotate and translate, so a lot of your top views might get lost as they won’t have much signal. But you know… whatever works.

1 Like

Given the reported RTX2080s… I have doubts 3D flex or 3DVA will succeed, as they’re both memory hogs. :frowning:

1 Like

I’ve got 3080s in one of our servers, I haven’t had overflow issues with 3DVA and 3DFlex yet (we usually work with ~ 1 - 5 mi downsampled ptcls). Ours are standalone servers so we don’t run more than one job of this kind each time, because of the RAM. We run them at ~8 or ~12 angstroms or so, running times are acceptable.

Fair enough. For heavily binned particles at low res, 10GB might be OK. But the 8GB RTX2080 box I still tinker with intermittently will run out of memory on the oddest jobs at times, so it was just an info bite from personal experience. :wink:

Did you unbin prior to refine? Is it possible the volume is incorrectly sized relative to the box for the refine?

I did not unbin prior to the homogeneous refine job, but the input particles were binned at* the same level as the particles that went into the ab initio initial volume (4X, for px size 2.92A). Does that seem appropriate? No mask and all default parameters since I’m not really familiar with how to adjust the options of this job.

my suggestion would be to reextract any particles of interest to full unbinned pixel size even if the binned pixel size should be sufficient to ā€˜cover’ the expected Nyquist resolution. reextraction also recenters the particles based on their medium resolution alignments, fixing potentially off-center picks that were corrected with x/y/z adjustments. you want the particle centered in the box for high-res. switch to NU-refine instead of homo refine (100% of the time for me, but at least in cases where you run into issues). none of these suggestions directly correspond to your strange refinement, which should very easily be high resolution (albeit with anistropy) from the particles you show in 2D and the great ab initio. there’s definitely a strange technical issue, and I suspect it’s an incorrectly sized reference volume, though that’s hard to do given cryosparc’s ability to rescale the volumes for the right pixel. if you download the bad high res map, and open in chimera, does the expected model fit ~correctly to that map? also does that map ~fit the ab initio maps? and same for the reference that was used as input to the refine? Surely using the ab initio shown and the particles shown in unbinned NU-refine would look great.

I will definitely keep the centering/extraction coupling in mind going forward–I hadn’t been checking this, and I think it’s really possible several of my jobs have gone poorly because of centering/box issues. In general I tend to experience a sharp drop in successful outcomes when I try to ā€˜unbin’ the particles.

I have 4 tilt angles from three collections, all collected at an external facility, and since I’m not as experienced working with eer files there was a lot of confusion in the initial import step (now weeks ago) that had me concerned there might be a CTF estimation issue or something else I did weirdly far upstream. If that’s an issue at this point I will have to deal with it later, post-deadline.

But per your suggestion, I downloaded the un-sharpened map from that HomoRefine job and saw that it was much less awful than what I posted above. I keep forgetting that the display in chimera is always different and usually better than what I see in the job volume viewer, even taking sharpening into account.

After posting I ended up proceeding by trying to get more particles since any 3DVar/Class type job will probably benefit from more particles, including pulling some from a collection at another angle, and have had a moderate amount of success with NURefine + a static mask. The resolution improved, but the job is definitely overselling it at 3.5A:

Inputs for above, for reference:

  • Particles: 223k particles (448px box, binned to 224px, pixel size ~1.5A) that came directly out of a Heterogeneous Refinement that didn’t end up looking so good (I queued it before the Het Refine finished..). The particles and the initial volume I do not think were in the same spot in the box when I started this job.
  • Initial Volume: a heterogeneous refinement that I did not apply symmetry to in computation but had aligned with the Volume Align Tool (my particles are C2 but I suspect pseudosymmetry, so I am doing refinements in C1 if the particles haven’t been sym expanded. I think what this job did is just reorient the dimer so the C2 axis it detected in the volume was on the z-axis–I guess it somehow is able to find that axis given the C2 prediction even though it was refined in C1)
  • Static Mask: The initial volume, which I filtered to 10A resolution and resampled in a box of the same size as the particles I’m working with (448 box, 224 F-crop), and reimported. The mask dilation was 3px and padding was 16px, which I think was on the tight side considering the heterogeneity I’m still working with?

Plots: (FSC….)

How would you recommend checking that particles are centered correctly relative to initial reference volumes/masks for refinement? Volumes I can open in Chimera and resample, but particles all I can think to do is run Ab Initio i.e. make them into a volume to be able to check. Is there a faster way? I suppose I could run Ab initio and ask for lower resolution.

chimera and volume viewer should look exactly the same.

can do 2D without recentering to see where the particles are.

having particles not centered in the box shouldn’t be that big of a deal. it’s just nice for high-res because of signal delocalization - the high-res signal could be elsewhere. but unless it’s egregious, the refinements should easily find and center particles (from a variety of off-centers) appropriately.

I never run any refinement using a mask except Local. NU-refine with mask always garbage for me/small particles.

my suggestion is: take the ab initio result, extract fully unbinned, use these particles and ab initio volume to run NU-refine. can do 2D no recentering on the side to make sure particles are well-centered. you could trim your box size down 20% but not necessary. definitely stay C1 until the problem is resolved. then can try to benefit from symmetry later. you will have exacerbated anisotropy, but you can work on that later too.

Your FSC curves are trying to tell you something that I would not ignore. They don’t look like you would expect. A ideal FSC curve should go down to zero with a smooth S shape. Yours don’t. The first part of your curve has this shape and then you have a large bump where it comes back up. My guess is that for some reason, cryosparc is overestimating the resolution and this is also causing it to over-refine which is why the curves and volume look the way that they do.

First thing to do would be to try to remove as much heterogeneity as possible using heterorefine and 2D classification. This should help substantially. I suspect that your refinements will then behave more the way that you expect.

Yes, I know for sure they are not supposed to look like that (jagged and with a large dip). I have heard that FSC pathologies kindof like this can be caused from masking. I think I am going to try addressing it by running a few NURefine jobs without the static mask, as CryoEM2 also suggested.

This didn’t work so I’ve been looking into it, and I have plausible explanations for both the jagged-ness and the dip. The jagged appearance improves when I perform symmetry expansion and refine only one of the monomers in local refinement. So I’m guessing it’s somehow related to the heterogeneity of the full dimeric particle. This was a test run with a mask I’m not sure was the best so I think I could get this looking better, but suggestions are always appreciated (currently working on implementing previous advice still). Opening the volume in chimera shows me that the resolution estimate of 3.19A is way overestimated.

The large dip is perhaps due to another aspect of my dataset I’ve not shared so far which is that it was all collected on phospholipid-coated grids and my targets are peripheral membrane proteins. According to this article by the CS team, the dip is a common artefact when the protein is surrounded by lipids or has regions of disorder.

Makes sense, I think you are on the right track related to the heterogeneity present in the full particle.

The situation you are pointing out is something is often seen for integral membrane proteins where there is a lot of low resolution signal around the protein from detergent or lipid. In your case, since your protein is peripherally bound I would not expect to see any effect on the FSC curve because it is bound to lipids on your grid.

In effect though if you have heterogeneity in the full complex of the particle, it will have a similar effect on the FSC curve since you have many particles that are only well correlated at low resolution. Hope this helps.

I do hope that’s the case, though the dip is still quite pronounced in the (definitely imperfect) local refinement of the monomer. In this case, the ligand that binds my target is expected to be embedded in the phospholipid surface at least some if not most or all of the time, and the primary protein dimer is expected to attach and similarly embed at least partially. As you mentioned, it isn’t the same as being encased in a bilayer, but do you think that it might have a similar effect on the FSC if the association is fairly tight?

Thanks for your insights