Minimum size for local refinement

ahmadkhalifa · November 30, 2020, 6:27pm

Hello,

Is there a size limit to the molecular weight to be refined via local refinement in cryosparc, similar to Relion’s criteria of at least 100-150 kDa inside the mask? I subtracted the signal in Relion except for a domain of less than 50 kDa from a consensus refinement, I then followed that with 3D refinement with local angular searches but Relion could not converge on a high resolution structure, but cryosparc surprisingly did, so I want to know why.

Best regards.

mmclean · November 30, 2020, 7:38pm

Hi @ahmadkhalifa,

This is interesting – we don’t really stipulate a minimum mass within the mask, as it usually depends on the dataset. For some datasets, 100-150 kDa definitely seems like the limit, but for others smaller may actually work, probably depending on the SNR in the images and how rigid the structure is.

The parameters that probably influence the quality of the alignments are:

Rotation and shift search extents – were they quite small? Large search extents will almost certainly result in poor alignments for a very small region.
Mask softness – using a softer mask (and softer dynamic mask parameters) help reduce mask edge artefacts that we’ve seen with local refinement before, allowing the structure to go to higher resolution without overfitting.

May I ask, which parameters did you run the local refinement with?

Best,
Michael

DanielAsarnow · December 2, 2020, 2:27am

I bet there is a parameter choice that explains the difference, if you post screen shots of the cryosparc job inputs and the Relion tabs we can probably figure it out.

ahmadkhalifa · December 2, 2020, 2:09pm

Thanks a lot for the responses, I ran the local refinement job with default parameters, i.e.

DLocal shift search extent (pix)

3
The maximum extent of local shifts that will be searched over, in pixels

DLocal rotation search extent (degrees)

10
The maximum magnitude of the change in rotations to search over, in degrees

DAlignment resolution (degrees)

0.5
Smallest search distance between angles, in degrees

The relion autosampling has similar search options, but their definition is rather different from the ones above, the extent of local shifts in cryosparc seems to be an upper bound, while in relion “initial offset step” is a lower bound, I’m assuming though if “initial offset step” is set to 1, there will be an initial overlap between the two algorithms, no? the other two settings can be matched by initial angular sampling of 7.5 (~10) and local searches of 0.5, but then again that’s not going to perform local searches in relion by definition, so could you please correct me if my understanding is incorrect, and also kindly explain more on how the cryosparc algorithm works, thanks a lot.

mmclean · December 2, 2020, 3:41pm

Hi @ahmadkhalifa,

With regards to the rotation/shift search extents: yes, they are (approximate) upper bounds on the changes in the rotation/shifts from the original consensus alignments, rather than per-iteration bounds. The way local refinement currently works is by searching over a coarse grid within the specified local search ranges, and then using a modified version of branch-and-bound alignment to subsample the grid until it reaches the alignment resolution (here, 0.5 degrees). At the end of alignment, it selects the best orientation within the fine search grid, and does reconstruction without marginalizing over orientations. More information on the general principle behind branch and bound is available in the original cryoSPARC publication

Unfortunately, I am not too familiar with the RELION implementation so I cannot advise on how their approach differs relative to cryoSPARC’s.

Best,
Michael

DanielAsarnow · December 2, 2020, 6:12pm

Relion’s translation parameters are a search range (per iteration) and a step size. The default is 1 pixel steps within 5 pixel radius. Each iteration a particle is moved to the best origin within 5 pixel radius of the current origin, at 1 pixel increments. Over multiple iterations it can move farther and farther. In local refinements I often start out with a finer translational search, like 0.5 / 3. Making the translation search narrower makes refinement a lot faster, so if I think the initial parameters are pretty good (previous resolution < 3 A), I might make the range a little smaller.

The angle parameters are roughly the average angle between test orientations. The values actually correspond to different HealPix grid orders. Relion helpfully prints the sampling needed for a given resolution given your particle diameter (mask diameter in Optimisation tab). Note the smaller the particle, the coarser the sampling for a given resolution. For a local refinement of most particles at reasonably high resolution around ~3 A, a value of 1.8 or 0.9 is appropriate. In auto-refine there is no angular search range parameter. The range is always +/- 6 * sampling. E.g. +/- 5.4 deg for 0.9 sampling.

And as you say, “initial sampling” must be at least as fine as “local searches from” in order to get local searches. One is the initial sampling, which will be advanced automatically as resolution improves across iterations. Local searches begin once the selected local search sampling is reached. Global searches are essentially guaranteed to fail here, so you need to use your good starting alignments for a local search. It’s important to preserve the random half-map assignments from the global refinement throughout your process. By default, csparc2star.py should be doing that. I believe cryoSPARC will also do that for imported particles with angles.

Assuming you actually ran with the 0.5/0.5 setting in Relion, the difference is that in cryoSPARC you used a much wider angular search range - 10 deg instead of 3. Probably Relion will work if you do local refinement in 2-3 stages, first using 3.7, then 1.8, etc. (or 1.8, then 0.9 if needed). The search range will be wider in the first step and become narrower as you make sampling more fine.

Also, in both programs, you should set the initial low-pass resolution somewhat lower than achieved in the global refinement, like 6A if you got to 4ish. In Relion I also recommend you use “ignore CTF until first peak,” “solvent flattened FSC,” “mask with zeros,” “CTF corrected reference.” Only use “ref map on absolute greyscale” if the reference was reconstructed by Relion.

If you exhaustively try these sampling strategies and Relion always fails to converge like cryoSPARC, then I would conclude that blurring due to marginalization is responsible. However, in all my experience, both programs give very similar local refinement results with the right parameter choices. (And naturally, cryoSPARC is much faster).

apunjani · December 2, 2020, 9:17pm

@DanielAsarnow’s explanation is great and right on.
Some additional notes: convergence in refinement (local or otherwise) is theoretically guaranteed for expectation-maximization algorithms. However in practice the rate of convergence (in almost all cases) and the ability to converge (in some extreme cases) are modulated by discretization, search, and regularization choices. In cryoSPARC because we use branch and bound alignment, poses are optimized to a fine discretization even with a large initial search extent, and regardless of current resolution. This speeds convergence because the algorithm doesn’t need to wait for the angular spacing to be (manually?) decreased over iterations in order for the 3D reference to contain detail from better alignments that are possible already at a given iteration. In extreme cases it’s possible that a brute-force pose alignment method (eg. Relion) will get trapped with the reference being at a medium resolution and the angular sampling being coarse, leading to alignments that contain discretization error, causing resolution to remain medium, never allowing the angular sampling to get finer or the reference map to improve so alignments can improve. Hard to say if that’s what’s happening in your case, but possible. Marginalization is also sometimes useful and sometimes detrimental. Noise model estimates play a large role in whether marginalization is going to be accurate and actually perform it’s regularizing function, or whether it will destroy information that would be useful for alignment in the next iteration. In cryoSPARC local refinement marginalization is not used currently and this ensures that even if the noise model is wrong (very likely, due to denatured protein layers, crowding, flexibility, disorder, etc) this does not ruin local refinement results. It is true that in some cases marginalization can help and this is something we are currently exploring.