Does referenced-based motion correction break the gold-standard assumption?

daniel.s.d.larsson · May 15, 2024, 6:08am

The gold-standard workflow assumes independent reconstruction of two sets of particles to estimate the FSC resolution of a reconstruction. Accordingly, RBMC seems to operate on each half-set separately (I cannot find any documentation about this, but I get an error if I only provide a pure 3D volume without half-maps to RBMC). My concern is that the default behavior of homogeneous (or NU) refinement jobs is to re-generate the randomized half-sets. This could potentially “leak” information between the two half-sets and artificially make them more similar to each other.

When I tried to disable force_gs_resplit in a homogeneous refinement job after RBMC, the job failed since there were now four particles more in one half-set (out of ~700,000 particles) and the error message suggest to enable re-splitting (I assume this is due to particle trajectory drifting too close to the edge or a neighbouring particle). Is there a way to trim one half-set to match the size of the other instead of re-splitting?

From reading the log, RBMC includes Fourier components to approximately twice the FCS resolution of the reference reconstruction when aligning particle trajectories (perhaps the same frequency as the one used for the phase-randomization test?) and the cross-validation is done to the FSC resolution. Perhaps this is deemed good-enough to not tarnish the independency of the half-sets at the highest resolution? But often, FSC values are overly optimistic in cases where there is preferred orientation or auto-masking issues. Having multiple references also seems to use the highest resolution reference for estimating this cutoff.

In my opinion, the more safe default behavior would be to automatically disable re-splitting after RBMC and either ignore a small difference is size of the half-sets or to trim one set (or even move a few particles over) to make the two half-sets match in size.

rbs_sci · May 15, 2024, 6:20am

Is this not a problem with strong orientation bias in general, not just in CryoSPARC?

The correlation leaking across when re-splitting half-sets is interesting. Could be fun to play with out of curiosity. Hm.

olibclarke · May 15, 2024, 8:49pm

Just to add - the default in local refinement is not to re-split (presumably because this would break things in cases of symmetry expansion) - what is the rationale for CS redoing the split if one already exists?

(out of interest @daniel.s.d.larsson have you tried this particle set in local refinement? I don’t recall it being so stringent with the half-set sizes but maybe I just never noticed…?)

peter.cherepanov · May 26, 2024, 11:19pm

I confirm this problem after Bayesian polishing in Relion and going back to Cryosparc on a membrane protein at a low resolution (~6 A). Allowing GS re-splitting results in inflated resolution estimations. Disabling GS re-splitting results in more realistic resolution estimates. One more thing to worry about…

rbs_sci · May 27, 2024, 1:28am

Why try Bayesian polishing at 6A? How big an increase do you see with the re-split vs. not re-split? Is it obviously fantasy territory (e.g.: map looks like 5A “lumpy sausage” alpha-helices but is reported as 2A?) or are we talking 0.1A here? Have you checked what the reassignments do - which particles get reassigned between half-sets, do particles still stay with their micrograph siblings or are they split across randomly?

peter.cherepanov · May 28, 2024, 8:51am

Depending on initial refinement parameters (relion Blush vs NU refinement, masking, etc) I do see an improvement after polishing. The best result was 5.3A to 4.8A (without re-splitting subsets) or to 4.4A (when re-splitting is allowed). This is accompanied by obvious improvement in the map: helical pitch becomes more clearly visible through most TMs, etc. I take it as a win, since we do not have a good idea how to stabilize the conformation and collect better data at this point.

TBH, I might have not realised there was a problem, if I did not have a second set of particles (a different conformation, from the same dataset) which “improved” from 6A to 3.4A. So the issue is variable. I suspect membrane proteins may be more affected due to masking of the irregular micelle component.

rbs_sci · May 28, 2024, 10:15am

Interesting, thanks for the info. I’ve not enough experience of the blush regulariser yet to really know how it behaves across a range of data, need to play with it more. My early experiments on subsets of some of our trickier datasets show that sometimes it helps, and sometimes it hurts, and sometimes it does basically nothing.

Looking at the NU Refine paper, that shows a map going from 6A sausage to 3.xA from NU refine…

apunjani · June 3, 2024, 9:29pm

Hi all, thanks for posting on this topic.

In CryoSPARC v4.5.2, we have changed default parameters so that refinement jobs (including Homogeneous and Non-uniform Refinement) will not re-split particles by default, and will no longer raise an error (but instead issue a warning) if the two input half-sets are substantially different in size. Refinement jobs will still re-split if the input particles do not have any half-sets assigned (e.g., the particles are coming directly from 2D classification or Ab-initio reconstruction) or if the “Force re-do GS split” parameter is turned on.
The Particle Sets Tool job has also been updated to provide a mode where it can re-balance half-set splits in case they become unequal (by dropping particles from the larger half-set).
Repeated rounds of Reference Based Motion Correction and refinements will therefore now by default retain the particle split throughout a processing chain.

Thank you!