I have several datasets collected on Krioses (how do you spell the plural of Krios?) to study a protein dimer in complex with another much smaller and more flexible protein (functionally, can be thought of as a peptide ligand). The main protein dimer is ~110kD and much longer than it is wide, so ~150A across the long way and maybe 40A ish across the short way.
Despite there being a model already available of the protein in the apo state and the data being collected on advanced hardware (Falcon 4i + energy filter), the data have been challenging to process. Some elements of the context:
There is a very strongly preferred orientation (thankfully, with the larger diameter pointed upward, displaying the more distinct, larger and more pick-able shape in the viewerās direction). We have attempted to overcome this issue by collecting data at angles 0, 15, 30, and 40 degrees.
There are two distinct arrangements of the dimer in the datasetāone much less common than the otherābut both basically invisible by eye when looking at the micrographs (i.e. impossible for me to pick reliably by hand, especially in any of the rare non-preferred views). So, low SNR.
The goal Iām shooting for at the moment is moderate resolution in the ~4A range to be able to distinguish the binding site of the peptide ligand, which is an alpha-helical region of a flexible protein. The region of the ligand in contact with my protein is probably ~4-6kDa.
The particles Iām working with at the moment are generally binned by a factor of 4 for a pixel size of ~2.9 and a Nyquist limit of just under ~6A. This is to reduce computation requirements for as long as possible, as Iām running everything locally on a machine with two nvidia RTX2080 GPUs and on a bit of a time crunch. Also Iām running cryosparc version 4.3.1. I know it would be better probably to update cryosparc but I canāt currently risk the program not working due to some unforeseen incompatibility.
Iāve used a combination of template picking and topaz to get particles and address some of the map anisotropy, and I sort the particles with a mix of 2D classification, ab initio, and heterogeneous refinement. Now I have what I think are quite decent 2D classes with visible secondary structure and Ab initio volumes, so Iāve just run Homogeneous refinement to get a better sense of what the resolution is and what features are visible. The result in the case of both arrangements of the dimer, both with and without a static mask (so 4 different jobs), is a very badly fragmented volume that essentially looks like noise and artefacts in the general shape of my complex, not actual continuous protein features. Also, the density that I suspected was the bound ligand is not really there anymore, and has been replaced by a rosette feature that I understand to be an artefact. The attached volume was made with over 220k particles as the input and no static mask, and all the parameters for the job were kept as the defaults.
ĪĻιοί (Krioi) according to Copilot. Greek-speaking people feel free to correct.
To me it really looks like a flexibility issue. Alignements fail because different parts of the protein require different translations and rotations of the entire particle image, so itāll never gonna work like this. You can try 3DVA and 3DFlex but your current 200 k ptcls set might be too small. Actually, flexibility probably already sabotaged your selection steps, especially if you relied on 2D classifications. You can find the entire workflows for flexible proteins (ādealing with conformational heterogeneityā or similar keywords) in recent Youtube videos by the CryoSparc crew, they are really well done.
The 3D looks terribly overfitted. As you say yourself, noise and artefacts. Not knowing what the 2D (and mics) look like makes a lot of recommendations either more general or guesswork.
The first problem to fix is the preferred orientation. What have you tried to overcome that? A 30° stage tilt is the quickest option if grid prep/biochemical optimisation has excessive resistance⦠I do understand when you have [dataset] and there is zero chance for another go at collection (been through it myself) but if you can at least minimise the preferred orientation youāll likely find that (a) processing is easier and (b) you get a better result at the end of it all. Even just an extra thousand mics with a variety of views can help.
Iād give skipping 2D entirely a go - with a well-thresholded pick set (so as little ice as possible) move straight to ab initio with a larger number of classes, larger number of iterations and a high class similarity. It will take quite some time, but might tease out some promising structures. Then take the lot into heterogeneous refinement (greatly increase number of iterations here) and see.
Your point 4 makes a lot of options troublesome, as they will take time, and ātime crunchā on a difficult dataset is a recipe for disaster.
Thanks rbs, Iām attaching a pic of the 2D class I did to check what I had before running this job. Itās 100 classes so the image might be comically large. But the ones near the top seem to be decent to me, and again everythingās binned so pixel size is about 3A. Also attached is the ab initio that I fed into the homogeneous refinement. Just as a sanity checkālooking at these I see good 2D and kinda mediocre ab initio? Since posting Iāve been trying a bunch of other things and reading case studies. I think that homogeneous refinement was likely not the right choice of job, but I still donāt understand how it came out THAT bad when clearly a lot of particles are that general shape.
Thanks carlos, I am trying these/working toward trying these suggestions now. Getting more particles and will focus more on jobs that are made to sort through heterogeneity. I was thinking that the heterogeneity in my dataset would be more along the lines of ligand bound/not bound but I think thereās more going on than that (clearly, at least with the two different dimer arrangements, one of which is not even shown here).
I do generally sort out particles in 3D, with het refine and 5 or 6 different starting maps that represent a spread of previously obtained volumes and junk classes. A colleague here also suggested running ab initio 3+ times in parallel (same particles, not iteratively), each time with 5 classes, allowing the random seed to change each time, then picking the most promising ab initios from those and moving into het refine. Those sets of ab initio jobs are running now.
Some people donāt agree with using ab initio to sort particles so Iāll be trying this and other methods. Iām not sure Iāll have time to try this, but I am curious about the outcome of 3x5class ab initio jobs vs 15x1class ab initio jobs. I guess it would only make sense to try the latter if you were sure you had basically no junk in the particle stack.
The 2D is a lot nicer than I expected from your description. Obviously a lot of heterogeneity in there - looks like very densely packed particles too.
The 3D model you show looks OK for an ab initio. Iād dump out the bottom three rows of classes (select all the others) and push with heterogeneous refinement (maybe six copies of the map as starting refs?) with a higher iteration count (maybe 15-20 and 5 full?)⦠at least for now.
Ab initio ā hetero might work indeed. I just find 3DVA and 3DFlex more informative provided that the initial alignment is good enough. In some cases they can even tell you about functioning of the protein itself. Also, you have to keep in mind that ab initio gives all freedom for each particle to rotate and translate, so a lot of your top views might get lost as they wonāt have much signal. But you know⦠whatever works.
Iāve got 3080s in one of our servers, I havenāt had overflow issues with 3DVA and 3DFlex yet (we usually work with ~ 1 - 5 mi downsampled ptcls). Ours are standalone servers so we donāt run more than one job of this kind each time, because of the RAM. We run them at ~8 or ~12 angstroms or so, running times are acceptable.
Fair enough. For heavily binned particles at low res, 10GB might be OK. But the 8GB RTX2080 box I still tinker with intermittently will run out of memory on the oddest jobs at times, so it was just an info bite from personal experience.
I did not unbin prior to the homogeneous refine job, but the input particles were binned at* the same level as the particles that went into the ab initio initial volume (4X, for px size 2.92A). Does that seem appropriate? No mask and all default parameters since Iām not really familiar with how to adjust the options of this job.
my suggestion would be to reextract any particles of interest to full unbinned pixel size even if the binned pixel size should be sufficient to ācoverā the expected Nyquist resolution. reextraction also recenters the particles based on their medium resolution alignments, fixing potentially off-center picks that were corrected with x/y/z adjustments. you want the particle centered in the box for high-res. switch to NU-refine instead of homo refine (100% of the time for me, but at least in cases where you run into issues). none of these suggestions directly correspond to your strange refinement, which should very easily be high resolution (albeit with anistropy) from the particles you show in 2D and the great ab initio. thereās definitely a strange technical issue, and I suspect itās an incorrectly sized reference volume, though thatās hard to do given cryosparcās ability to rescale the volumes for the right pixel. if you download the bad high res map, and open in chimera, does the expected model fit ~correctly to that map? also does that map ~fit the ab initio maps? and same for the reference that was used as input to the refine? Surely using the ab initio shown and the particles shown in unbinned NU-refine would look great.
I will definitely keep the centering/extraction coupling in mind going forwardāI hadnāt been checking this, and I think itās really possible several of my jobs have gone poorly because of centering/box issues. In general I tend to experience a sharp drop in successful outcomes when I try to āunbinā the particles.
I have 4 tilt angles from three collections, all collected at an external facility, and since Iām not as experienced working with eer files there was a lot of confusion in the initial import step (now weeks ago) that had me concerned there might be a CTF estimation issue or something else I did weirdly far upstream. If thatās an issue at this point I will have to deal with it later, post-deadline.
But per your suggestion, I downloaded the un-sharpened map from that HomoRefine job and saw that it was much less awful than what I posted above. I keep forgetting that the display in chimera is always different and usually better than what I see in the job volume viewer, even taking sharpening into account.
After posting I ended up proceeding by trying to get more particles since any 3DVar/Class type job will probably benefit from more particles, including pulling some from a collection at another angle, and have had a moderate amount of success with NURefine + a static mask. The resolution improved, but the job is definitely overselling it at 3.5A:
Particles: 223k particles (448px box, binned to 224px, pixel size ~1.5A) that came directly out of a Heterogeneous Refinement that didnāt end up looking so good (I queued it before the Het Refine finished..). The particles and the initial volume I do not think were in the same spot in the box when I started this job.
Initial Volume: a heterogeneous refinement that I did not apply symmetry to in computation but had aligned with the Volume Align Tool (my particles are C2 but I suspect pseudosymmetry, so I am doing refinements in C1 if the particles havenāt been sym expanded. I think what this job did is just reorient the dimer so the C2 axis it detected in the volume was on the z-axisāI guess it somehow is able to find that axis given the C2 prediction even though it was refined in C1)
Static Mask: The initial volume, which I filtered to 10A resolution and resampled in a box of the same size as the particles Iām working with (448 box, 224 F-crop), and reimported. The mask dilation was 3px and padding was 16px, which I think was on the tight side considering the heterogeneity Iām still working with?
How would you recommend checking that particles are centered correctly relative to initial reference volumes/masks for refinement? Volumes I can open in Chimera and resample, but particles all I can think to do is run Ab Initio i.e. make them into a volume to be able to check. Is there a faster way? I suppose I could run Ab initio and ask for lower resolution.
chimera and volume viewer should look exactly the same.
can do 2D without recentering to see where the particles are.
having particles not centered in the box shouldnāt be that big of a deal. itās just nice for high-res because of signal delocalization - the high-res signal could be elsewhere. but unless itās egregious, the refinements should easily find and center particles (from a variety of off-centers) appropriately.
I never run any refinement using a mask except Local. NU-refine with mask always garbage for me/small particles.
my suggestion is: take the ab initio result, extract fully unbinned, use these particles and ab initio volume to run NU-refine. can do 2D no recentering on the side to make sure particles are well-centered. you could trim your box size down 20% but not necessary. definitely stay C1 until the problem is resolved. then can try to benefit from symmetry later. you will have exacerbated anisotropy, but you can work on that later too.
Your FSC curves are trying to tell you something that I would not ignore. They donāt look like you would expect. A ideal FSC curve should go down to zero with a smooth S shape. Yours donāt. The first part of your curve has this shape and then you have a large bump where it comes back up. My guess is that for some reason, cryosparc is overestimating the resolution and this is also causing it to over-refine which is why the curves and volume look the way that they do.
First thing to do would be to try to remove as much heterogeneity as possible using heterorefine and 2D classification. This should help substantially. I suspect that your refinements will then behave more the way that you expect.
Yes, I know for sure they are not supposed to look like that (jagged and with a large dip). I have heard that FSC pathologies kindof like this can be caused from masking. I think I am going to try addressing it by running a few NURefine jobs without the static mask, as CryoEM2 also suggested.
This didnāt work so Iāve been looking into it, and I have plausible explanations for both the jagged-ness and the dip. The jagged appearance improves when I perform symmetry expansion and refine only one of the monomers in local refinement. So Iām guessing itās somehow related to the heterogeneity of the full dimeric particle. This was a test run with a mask Iām not sure was the best so I think I could get this looking better, but suggestions are always appreciated (currently working on implementing previous advice still). Opening the volume in chimera shows me that the resolution estimate of 3.19A is way overestimated.
The large dip is perhaps due to another aspect of my dataset Iāve not shared so far which is that it was all collected on phospholipid-coated grids and my targets are peripheral membrane proteins. According to this article by the CS team, the dip is a common artefact when the protein is surrounded by lipids or has regions of disorder.
Makes sense, I think you are on the right track related to the heterogeneity present in the full particle.
The situation you are pointing out is something is often seen for integral membrane proteins where there is a lot of low resolution signal around the protein from detergent or lipid. In your case, since your protein is peripherally bound I would not expect to see any effect on the FSC curve because it is bound to lipids on your grid.
In effect though if you have heterogeneity in the full complex of the particle, it will have a similar effect on the FSC curve since you have many particles that are only well correlated at low resolution. Hope this helps.
I do hope thatās the case, though the dip is still quite pronounced in the (definitely imperfect) local refinement of the monomer. In this case, the ligand that binds my target is expected to be embedded in the phospholipid surface at least some if not most or all of the time, and the primary protein dimer is expected to attach and similarly embed at least partially. As you mentioned, it isnāt the same as being encased in a bilayer, but do you think that it might have a similar effect on the FSC if the association is fairly tight?