FSC cut-off in heterogeneous refinement

Hi, I am new to cryoEM and cryoSPARC and attempting to process my first dataset. I am running into some problems I am hoping someone can help with. I’m working with a relatively large dataset of a non-symmetrical complex with approximately 200-120 A dimensions. The complex oligomerizes, forming dimers and trimers, with monomers and trimers being the most common species. Additionally, the dataset is a bit dense with frequent particle overlap which must be weeded out from pure monomers or actual oligomers. The trimer appears to exhibit preferential orientation within the ice, so I am now focusing on resolving the monomer instead. Initially, I ran into two main issues with the monomer: 1) overlapping particles getting sorted into good looking monomer classes but with a soft cloud appearing in the area of the overlap, and 2) 2D junk classes getting averaged into good classes by the final iterations. I seem to have resolved these problems by turning off re-centering of 2D classes, masking, and the annealing of sigma. However, following ab initio reconstruction, I am having issues with hetero refinement FSC curves getting cut off. Per the forum, this can occur with downsampling, but I have not down sampled (because I have yet to figure out how to successfully do this :crazy_face:. Here is my process so far:

Un-tilted data was collected on Krios with the following parameters:
accelerating voltage (kV): 300
spherical aberration (mm): 2.7
total exposure dose: 50.87
pixel size (A): 0.656

Patch CTF job was performed on a total of 15202 micrographs using standard parameters. An initial quick pipeline was used to generate 2D templates from a volume map, and a subsequent template picker job picked 6.1 million particles using standard parameters except for particle diameter (A) set to 220. A curate exposures job was used to pare micrographs down to 12,514, and a subsequent extract micrographs job using an extraction box size of 600 pix extracted 3.8 million particles. Seven rounds of 2D classification with standard parameters except: use circular mask on 2D classes = OFF, re-center 2D classes = OFF, Iteration to start annealing sigma = 200 (essentially OFF), number of final full iterations = 5, number of classes = varying from 200 to 80, number of online-EM iterations = varying from 40 to 30, generated 463,000 good particles. single class, 3 class, and 6 class Ab Initio was performed on just 100,000 particles to speed up processing time, and a heterogeneous refinement job was performed on the best classes but with the total 463,000 particle dataset. The volumes from these hetero refinements look great, but the FSC curves approach the nyquist limit and are cut off. I started a homogeneous refinement job on the single class ab initio and so far the FSC curves look normal.

I am wondering what I can do to improve my process and troubleshoot the wonky FSC curve. Thanks!

2D classes used for Ab Initio:

FSC cut off during hetero refinement

FSC look normal at iteration 000 from homogeneous refinement:
Screenshot from 2023-10-16 19-00-04

Just a quick note - if you set a target resolution for heterogeneous refinement, CryoSPARC will downsample your particles on the fly - that would account for the “wonky” FSC curve.

I’d take that Iteration 081 Class 000 output and do a local refinement with it - I suspect you’ll see a “normal” FSC as it won’t rescale on the fly.

Ahh, thank you so much for this explanation. I didn’t know about the on-the-fly downsampling. Just to clarify, if I take the heterogeneous refinement output and do a homogeneous refinement on the best class, will I need to make any adjustments to rescale, or will the FSC curves automatically normalize during the homogeneous refinement process? I don’t plan to do a local refinement until after a homogeneous refinement. Does this make sense? Thank you!

It should use the original particle stack (non-downsampled). One (of the many!) nice features of cryoSPARC is the on-the-fly rescaling of various things (references, etc).

Why not try both - homogeneous refinement from the heterogeneous class and direct to a local?

Excellent, thank you again!

Hi @Inko !

Just to expand a bit on @rbs_sci’s points, heterogeneous refinement does not use half-sets, so it is much more sensitive to overfitting than homogeneous or non-uniform refinement are. That’s one of several reasons it (by default) downsamples your particles. The FSC is cut off because the downsampled particles have a much lower Nyquist resolution than the full-size particle images.

I hope that helps!

1 Like

Hi rposert,
Thank you! So, in other words, do you mean that because heterogeneous refinement is more sensitive to overfitting, it essentially needs to downsample to avoid this inclination? (I am still learning what half-sets are and their purpose. At this point, my crude understanding is that they provide the model with training input that is independent of the model input?)

Thanks for your insight!

Half sets are a tricky but important concept! There is a nice summary in cryoEM 101 and I wrote a short summary in another post, but to answer your direct question: they are a little bit like a train/test split, but instead of creating a single model and testing it against data it didn’t see, we create two models and compare them to each other. This gives us a good estimate of the resolution we can trust.

So yes, since we don’t know what resolution we can trust in heterogeneous refinement, it is good practice to keep the particles downsampled so that only lower resolution information is available to the alignments. It also helps with memory and speed :slight_smile:.

Notably, RELION solves the problem of overfitting in 3D classification/heterogeneous refinement in another way by giving the user a tunable parameter (“T”, the regularization parameter), but we’re all trying to solve the same problem.

3 Likes

Thank you @rposert ! This makes a whole lot of sense. There is always so much to learn!

1 Like

I’m glad it was helpful! Whenever you run into a concept you’re not quite getting please do make a post here asking for more clarification, I’m sure you’ll get many people excited to help explain it!

1 Like