Experimental support for Relion's Bayesian polishing in csparc2star.py

This map vs particle perspective explanation makes sense, although I’m only able to follow in abstract terms. :slight_smile:

Sure. I git clone-d the updated version, but csparc2star.py --flipy is returning the following error:

Traceback (most recent call last):
  File "<redacted>/pyem/csparc2star.py", line 177, in <module>
    sys.exit(main(parser.parse_args()))
  File "/<redacted>/pyem/csparc2star.py", line 97, in main
    df = star.transform_star(df, np.array([[1, 0, 0], [0, -1, 0], [0, 0, -1]]), leftmult=True)
  File "/<redacted>/pyem/pyem/star.py", line 586, in transform_star
    angles = np.rad2deg(rot2euler(newrots))
  File "/<redacted>/software/miniconda3/envs/pyem/lib/python3.9/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/<redacted>/software/miniconda3/envs/pyem/lib/python3.9/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
- Resolution failure for literal arguments:
reshape() supports contiguous array only
- Resolution failure for non-literal arguments:
reshape() supports contiguous array only

During: resolving callee type: BoundFunction(array.reshape for array(float64, 3d, A))
During: typing of call at <redacted>/pyem/pyem/geom/convert_numba.py (29)


File "<redacted>/pyem/pyem/geom/convert_numba.py", line 29:
def rot2euler(r, out=None):
    <source elided>
    epsilon = 1e-16
    r = r.reshape(-1, 9).reshape(-1, 3, 3)
    ^

Does the conda environment/dependencies require an update as well?

Cheers,
Yang

My double transpose is causing the array to not be byte contiguous, which is interfering with some optimized numpy code. I need to add a np.require call I think. I’ll have time in a couple of hours.

Thanks again for testing so gamely, I really appreciate it!

No problem. Happy to help test it. We picked it up hoping to avoid having to redo motion correction for an active dataset, but then got stuck in and have since redone it six ways to Sunday.

Cheers,
Yang

Sorry for the delay. Should work now.

Hi @DanielAsarnow

pyem newly git clone-d today.

Examplar particle data
# Reference particle (AngleRot, AngleTilt, AnglePsi, OriginX, OriginY)
-69.794205 65.145355 171.596817 -22.796064 12.710671

# Original csparc2star.py --flipy
69.794205 65.145354 171.596810 -22.796064 -12.710671

# Manual awk modification from S3c
110.206 114.855 -171.597 -22.796064 -12.7107

# Updated csparc2star.py --flipy 
110.205795 114.854646 -171.596810 -22.796064 -12.710671

# Updated csparc2star.py --flipy (# line 95)
110.205795 114.854646 -171.596810 -22.796064 12.710671

bfactor.eps from BPP looks good now. With --flipy as coded, refinement of shiny particles is working as well as in S3a/c. Relative to that, commenting out line 95 led to a slightly worse outcome–still better than unpolished.

I think the inversion of OriginY is the correct transformation. The fact that BPP didn’t blow up completely with the wrong Y-shift may be down to the translation errors being interpreted as, and subsequently absorbed into particle motion to a degree during tracking.

For reference, below is the rlnOriginYAngst distribution in this particular input particle data (pre-shiny).

Y-shift distribution

There still remains a disrepancy between PM- vs RC/MC2-derived data, with the latter producing better results. This could be down to the first-frame issue. It’s probably worth comparing with different datasets in order to corroborate.

Cheers,
Yang

Have you tested if recentering the coordinates has any impact? With star.py --recenter for example…BPP itself doesn’t do any recentering.

I think you’re right about some shift errors being correctable during tracking. If you imagine bringing each frame of a particle into register with a reference projection, then it could be that this represents a physical motion trajectory from frame to frame, or random shift errors on each frame. I think the extent to which it is feasible to align individual frames of a given particle determines the importance of a good initial trajectory. For example a ribosome could be alignable regardless of the initial trajectory while a smaller particle might require the restraints from other nearby particles or a very good initial trajectory.

Last question - if you run polishing iteratively, does the 2nd run based on PM catch up to the RC/MC2 results?

Thanks again for testing, it’s very satisfying to figure out what was wrong with my transformation before.

@leetleyang Also to clarify something, the PM trajectories are recentered to the first frame when they are exported. If you plot the MC2 and PM trajectories you’ll see they are the almost the same. In MC2 as well, the initial reference is actually the unaligned sum. It’s really just that the final results are written out relative to the first frame.

When you say the results are still worse than scenario 1 - how close are they? Is it attributable to random variation between refinements (if you repeat a refinement there can be ~0.1Å variations in the nominal resolution)? Do you do a 2D cleaning after polishing? It doesn’t mask hot pixels like MC2/RC/PM do.

Hi @DanielAsarnow

Indeed, I saw from the code that the PM trajectories are readjusted to the first frame for reference.

Also, I have since done some of the reading that I should have done before thinking out loud(!), and found out that in the absence of the -FmRef flag, MC2 uses the central frame as its reference–not the first frame. I’m not entirely sure what RC does but I assume the same given their shared provenance? Also, that BPP takes this into account when interpreting particle coordinates?

From the MotionCor2 manual:
The output and log files list the shifts relative to the first frame. However, the correction is
relative to the central frame by default. “-FmRef” is a switch that allows to choose the
reference either the central frame by giving it a non-zero value or the first frame by setting
it zero.

And for completeness, from this post, I got the impression that in PM all frames get shifted to some degree, i.e. there is no strict reference frame.

To clarify my thought process at the time, for others reading. Due to different frames of reference, the averaged micrographs from one will not line up perfectly with the other’s. So coordinates (x,y) from one may coincide with (x+a,y+b) from another. This amounts to errors in OriginX and OriginY going between the two. This is independent of the trajectories. Not sure if this rationalisation is how it actually works.

I take your point. gsFSC=0.143 is within 0.1Å. The caveat being that this particle set is limited by conformational heterogeneity so it’s been difficult to gauge. I would normally chalk it down to random variation and think nothing more of it, save that RC/MC2 has consistently (n=4 separate BPP+refinement runs) given slightly better FSC (across the resolution range, not just at 0.143) vs PM (n=3), so I thought to make a note of it. Internally, the RC/MC2 refinements have been rather consistent. As have the PM refinements. The ±line95 test was also within 0.1Å.

Another dataset where BPP has had more potential to improve the map may be a more suitable test case.

No, I haven’t in this case. The particle stack was imported directly into cryoSPARC for refinement. I thought less manual manipulation would be a fairer comparison.

EDIT: It’s probably worth mentioning that as a matter of routine, I perform preprocessing without a defect file.

I can run these tests. But to answer your question, I haven’t experimented with recentering. Also, a 2nd bootstrapping run would be using an improved reference. It may give a better result based on that alone. I should reiterate again that the choice of this particular dataset is a bit unfortunate. It is what we had available. I’ll check to see if one of our other EMPIAR cases gave a better return from BPP.

Cheers,
Yang

I’ve run this test following your improved flipy transformation.

In short, with this EMPIAR data, bootstrapping while maintaining the original coordinates doesn’t seem to narrow the gap to the RC/MC2 workflow.

Scenario 3: CS to RELION
  1. d) tif movies Patch Motion-corrected in CS.
    → pre-shiny particle coordinates from S1 Y-inverted and used to extract from csMicrographs (Recenter using aligned shifts disabled).
    → Particles refined against original reference (3.1-3.2Å).
    csparc2star.py --movie to generate csMicrograph and motion star files.
    cpsarc2star.py --flipy particle conversion for polishing.
    half_maps and mask_fsc_auto used as is for postprocessing.
    shiny_1 particles yield better results than unpolished, but worse than RC/MC2 workflow.
    csparc2star.py particle conversion for second polishing.
    star.py micrograph/coordinate concatenation with shiny_1.
    → improved half_maps and mask_fsc_auto used as is for postprocessing.
    shiny_2 particles yield very similar results to shiny_1.

At the outset, I assumed the gap to the RC/MC2 workflow would narrow on the second run through. However, I wonder if the purported shift errors could disproportionately affect the alignment of the earlier frames, which also carry more of the information.

Cheers,
Yang