Cannot align small protein complex particles

Hi All,

I’ve collected a dataset of a small protein complex (~50 kDa) with surprisingly (to me at least) good contrast, but cannot get them to align in either 2D or 3D.

The particles are fundamentally mostly beta propellers that are ~45A in diameter. I’m attaching a 2.5 um defocus micrograph for reference. A bit dense yes, but not too dissimilar to some of the published micrographs that yield high resolution structures.

example

The images were collected using on a Titan Krios with a K3 and energy filter. The collection strategy was involved using beam-tilt to cover multiple shots per hole on a 3x3 grid, with coma free compensation in SerialEM. Defocus was sampled between 0.8 and 3 um. We used a super-res angpix of 0.326A, and then binned 2x during motion correction. The dose is 60 e/ang^2 over 40 frames. I selected micrographs with ctf fits better than 3.5A and could extract ~12m particles.

The picking was easy with LoG picking, and could be improved with “fuzzy” 2D templates that effectively match the size and shape of the particle (see below). I am convinced that the particles are picked and centered properly.

I have tried to run a 2D classification and this is an example subset of classes:

I’ve explored the option parameter space:

  • Increased online em cycles to 40
  • Increased batch size to 400 or 1000
  • Increased class number to 150, 200, 250
  • Varied the particle box size between 1.5x and 2x the particle diameter of 45A
  • Tightened the inner circular mask to 55A, 60A, 65A
  • tried enforcing non-neg and clamp solvent
  • Tried limiting the resolution to 8, 10, 12, 15A

I’ve then tried skipping 2D classification and ab initio reconstruction with either 1, 3 or 6 classes. I tried increasing initial and final minibatch sizes to 1500, as well as limiting the initial resolution to 20A and the final resolution to 9 or 12A. The initial models do not seem right and many show streaky density (overrefined noise?).

With the particles downsampled to a pixel size of 1.1A, I have also tried homogenous and heterogenous refinement either against these poor inital models, or against a volume calculated from a crystal structure with molmap. I have a crystal structure of the beta propeller core of the particle and am really only interested in the interaction with a peptide that supposedly binds on the surface. I’ve dared go as low as 7A lowpass filtering on the initial model volume, hoping particle alignments might “latch” onto secondary structure features. I’ve tried extracting good particles through heterogenous refinement with one 5/6 initial models having phases randomized to 25A. Nothing to indicate a successful reconstruction or discrimination of good particles.

I’ve tried binning particles to 2.6A pixel size, in an attempt to improve contrast and repeating the some of the refinement steps above, without any success.

I understand that the particles are small and not very feature-rich, but I was excited by the amount of contrast I could see, even at low defocus.

Any ideas or feedback would be helpful and appreciated!

Cheers,
Stefan

Hi Stefan,

I would suggest trying using a larger box size - even at 2x the particle size, you will be cutting off a lot of information for a particle so small.

The box size to use (in order not to lose information delocalized due to the CTF) is:

B=D+(2XLX(dF/D) (from Rosenthal & Henderson JMB 2003)

where D is the diameter of the particle, L is the wavelength (~0.02Å at 300kV), dF is the defocus, and D is the target resolution.

So at let’s say 1.5um defocus with a target res of 5 Å, you would need a box size of 165 Å in this case - and a box size of 90 Å would mean you are losing quite a lot of information beyond 12 Å, if I have it right.

Cheers
Oli

Hi Oli,

Thank you for the immediate response. I should really spend more time reading the EM literature!

I will let you know if your suggestion fixes the problem, but it certainly makes a lot of sense.

Cheers,
Stefan

No worries! The other thing I’d try for small parrticles, if you haven’t already, is setting “Force/max over poses/shifts” to “OFF” in combination with an increased number of iterations (40 is usually ok) and an increased batchsize (e.g. 400)

Cheers
Oli

Hi Oli,

I’m wondering… when does this relation B=D+(2XLX(dF/D) break down in practical terms?

If I wanted a 3A target resolution, with a maximum 3um defocus, I’m looking at a 845A box. That contains almost 20 of my particles!

Best,
Stefan

well - hopefully your mean defocus is not 3um… I would collect as close to focus as practical with reasonable contrast. If you were at 1um defocus, that value would be 180 Å (and actually even at 3um I “only” get 445).

Also I’d need to re-read that paper but I think that is only referring to how far one can expect to delocalize signal based on the distance from the edge of your particle to beyond the edge of your box - but boxes also have corners, and your particle has a center. In other words it is a fairly conservative estimate, erring on the side of bigness.

Understood – thanks! You’re right, somehow I slipped an extra factor of 2 in the calculation.
I’ve collected at fairly wide range of defocus values, but I also have a ton of data.

Would you recommend discarding particles imaged at a large defocus or does their presence not affect the reconstructions?

But as you just realized, don’t collect data beyond 2um defocus. For small particles, you might even want to stay very close to 1um.

Best,

Amedee

1 Like

Hi All,

thank you for all the good advice! Several rounds of 2D classification to “purify” the particle set and extensive experimentation with the 2D classification parameters has led to some pretty recognizable views of a beta propeller. I found that marginalizing over poses and shifts was key, increasing the number of iterations and batchsize (40/400) as was finding the right threshold for pixel values to use in recentering the particles (as high as 75%). This last parameter is important in dealing with contrast from neighboring particles causing the alignments to drift.

As you can see, there are “shadows” from neighbors appearing at consistent locations in random orientations. Nevertheless, I could get an ab initio model that looks relatively reasonable (or does it?):

It appears that allowing the particles to sort into 3 ab initio classes and increasing the minibatch sizes is helpful. However, I cannot get a 3D heterogenous or 3D homogenous refinement to work. I’ve tried defining masks of various sizes to limit the influence of neighboring particles on the alignment, and reduced the lowpass filtering of the initial ab inito models, but with little effect on the outcome.

I know I’m not working with easiest data (dense, small 50 kDa particles), but given that the particles can be aligned into a variety of views in 2D raises the question of why the same should not be possible in 3D.

I’d be grateful for any observations and suggestions to put me on the right track!

Cheers,
Stefan

Hi Stefan - what parameters are you using for ab initio? The defaults (initial resolution 35, final 12) are often not great for small particles with few features at low resolution. Often starting at 9Å and going to 7Å, or even starting at 7 and going to 5, gives better results. Increasing the batch size can also help. E.g. for C1 ab initio of apoferritin, I find that increasing the batch size to 1000 helps with obtaining an isotropic reconstruction from ab initio.

Also for 2D, you might try a tighter mask to exclude some of the neighboring particles.

Cheers
Oli

1 Like

Hi Oli,

thank you for your (as always) fast and informative response. As far as ab inito goes, I have been testing different resolution parameters. The results I have shown above come from a 35Å/7Å intial/final combination. I will keep reducing the initial resolution; thanks for the tip!

My initial/final minibatch size has been 400/1200. Should I increase the initial batch size further? Is there a benefit to starting with fewer particles per batch other than speed?

With regards to masking 2D classes, I have tried it and a tighter mask does make the classes look cleaner, but there is no apparent benefit to the quality of the averages or appearance of new views. The output number of particles in good looking classes is comparable between 2D classifications with and without masking. My understanding is that the particles do not get altered by 2D masking – would that help with downstream steps?

Cheers,
Stefan

Hi all,

I am dealing with a similar problem and this thread has been very useful so far (thanks @olibclarke!).
I am stuck at the same step as @stefan. My 2D classes look very similar to the latest ones
shown here, but I can’t get any reasonable ab-initio or Homo/Hetero refinement to work properly.
Are there any updates on this dataset?

Best,
twg

Hi @twg,

Unfortunately, the short answer is: no. I’ve tried 3D classifying with Relion to identify heterogeneities, but without success. It seems to me that there is not sufficient singal to align the particle views robustly. I’d love to hear more ideas for troubleshooting, or if you manage to make progress.

Cheers,
Stefan

Hi :slight_smile:,

I’m facing more or less the same problem. I’ve collected a dataset of a small and elongated protein complex (~75 kDa) with good contrast too. We align the 2D but we have issues with the 3D.

I attach a pair of micrograps for reference:

They were collected on a Talos Arctica with a K2 at 36 K magnification (1.2 A practical, 1.13 A corrected). One shot per hole on a 1.2/1.3 Au grids. Defocus was settled between 0.5 and 1.7 um. The dose is 50 over 65 frames (0.77 e/A/frame). I curate the micrograph to select the ones with ctf fits better than 5 A and I extracted around 2 million particles.

I use previous data to pick particles and I extracted them with a 320 box size.

I run a 2D classification and I get good 2D classes (image after several 2D classification to clean the dataset):

*Even though, I was trying to use the parameters described here to maybe improve my 2D classes (I’ll explain that after).

But then, when I move to the Ab-initio and try the 3D reconstruction (1, 2 and 3 classes) I don’t get good models.
1class

Either with the NU-ref.


Screenshot 2020-07-31 at 12.16.54

Even if the particles are small I think the 2D classes are kind of promising and I think it could be possible to improve the 3D model to get a good one. So, I tried the advises that are proposed here but for both 2D and 3D, i get the same error:

I understand it’s a lack of GPU problem, so maybe my job settings are too exigent or they make non-sense at some point, or maybe both.

My 2D parameters are:

  • Number of 2D classes: 80
  • Max resolution: 3
  • Initial classification uncertainty factor: 4
  • Re-center mask threshold: 0.1
  • Re-center mask binary: ON
  • Force Max overr poses/shifts: OFF
  • Number of online-EM interations: 40
  • Batchsize per class: 300
  • Number of iteration to anneal sigma: 25
  • Cache particle images on SSD: OFF

And using my previous 2D classes I was trying to run an Ab-initio job with the following parameters:

  • Maximum resolution (Angstroms): 9 (before I tried 7)
  • Initial resolution (Angstroms): 12 (before I tried 9)
  • Initial minibatch size: 400
  • Final minibatch size: 1200
  • Cache particle images on SSD: OFF
  • (For 1, 2 and 3 classes).

If someone have any suggestions or ideas I’d be really happy to try them :slight_smile:.

Lu.

Hi Lu,

Classes look nice! :slight_smile:

What GPU are you using? I wouldn’t expect you to run out of memory using those settings on a 1080Ti or 2080Ti…

Also how many particles do you have remaining in the NU-refine you show here? Maybe need a bit more cleanup in 3D?

Oli

Hi Oli,

Sorry for the delay, I was trying to figure out which GPUs we have, apparently the GPUS used on the server are K80 Tesla cards.

In that NU ref I have only 173329 particles, since I select only a few classes to avoid my preferential orientation problem.

Lu.

They should definitely not be running out of memory then! that card has 24GB, should be more than enough for such small particles

Might be a long shot, but depending on your server setup, there could’ve been another user or process using the GPU memory when you tried to use cryosparc. Try “nvidia-smi” in console to see the memory usage. If your cryosparc is implemented in a node-based configuration (rather than just a single server) then this suggestion won’t apply to your case.