Does referenced-based motion correction break the gold-standard assumption?

The gold-standard workflow assumes independent reconstruction of two sets of particles to estimate the FSC resolution of a reconstruction. Accordingly, RBMC seems to operate on each half-set separately (I cannot find any documentation about this, but I get an error if I only provide a pure 3D volume without half-maps to RBMC). My concern is that the default behavior of homogeneous (or NU) refinement jobs is to re-generate the randomized half-sets. This could potentially “leak” information between the two half-sets and artificially make them more similar to each other.

When I tried to disable force_gs_resplit in a homogeneous refinement job after RBMC, the job failed since there were now four particles more in one half-set (out of ~700,000 particles) and the error message suggest to enable re-splitting (I assume this is due to particle trajectory drifting too close to the edge or a neighbouring particle). Is there a way to trim one half-set to match the size of the other instead of re-splitting?

From reading the log, RBMC includes Fourier components to approximately twice the FCS resolution of the reference reconstruction when aligning particle trajectories (perhaps the same frequency as the one used for the phase-randomization test?) and the cross-validation is done to the FSC resolution. Perhaps this is deemed good-enough to not tarnish the independency of the half-sets at the highest resolution? But often, FSC values are overly optimistic in cases where there is preferred orientation or auto-masking issues. Having multiple references also seems to use the highest resolution reference for estimating this cutoff.

In my opinion, the more safe default behavior would be to automatically disable re-splitting after RBMC and either ignore a small difference is size of the half-sets or to trim one set (or even move a few particles over) to make the two half-sets match in size.

3 Likes

Is this not a problem with strong orientation bias in general, not just in CryoSPARC?

The correlation leaking across when re-splitting half-sets is interesting. Could be fun to play with out of curiosity. Hm.

Just to add - the default in local refinement is not to re-split (presumably because this would break things in cases of symmetry expansion) - what is the rationale for CS redoing the split if one already exists?

(out of interest @daniel.s.d.larsson have you tried this particle set in local refinement? I don’t recall it being so stringent with the half-set sizes but maybe I just never noticed…?)