Experimental support for Relion's Bayesian polishing in csparc2star.py

This map vs particle perspective explanation makes sense, although I’m only able to follow in abstract terms. :slight_smile:

Sure. I git clone-d the updated version, but csparc2star.py --flipy is returning the following error:

Traceback (most recent call last):
  File "<redacted>/pyem/csparc2star.py", line 177, in <module>
    sys.exit(main(parser.parse_args()))
  File "/<redacted>/pyem/csparc2star.py", line 97, in main
    df = star.transform_star(df, np.array([[1, 0, 0], [0, -1, 0], [0, 0, -1]]), leftmult=True)
  File "/<redacted>/pyem/pyem/star.py", line 586, in transform_star
    angles = np.rad2deg(rot2euler(newrots))
  File "/<redacted>/software/miniconda3/envs/pyem/lib/python3.9/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/<redacted>/software/miniconda3/envs/pyem/lib/python3.9/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
- Resolution failure for literal arguments:
reshape() supports contiguous array only
- Resolution failure for non-literal arguments:
reshape() supports contiguous array only

During: resolving callee type: BoundFunction(array.reshape for array(float64, 3d, A))
During: typing of call at <redacted>/pyem/pyem/geom/convert_numba.py (29)


File "<redacted>/pyem/pyem/geom/convert_numba.py", line 29:
def rot2euler(r, out=None):
    <source elided>
    epsilon = 1e-16
    r = r.reshape(-1, 9).reshape(-1, 3, 3)
    ^

Does the conda environment/dependencies require an update as well?

Cheers,
Yang

My double transpose is causing the array to not be byte contiguous, which is interfering with some optimized numpy code. I need to add a np.require call I think. I’ll have time in a couple of hours.

Thanks again for testing so gamely, I really appreciate it!

No problem. Happy to help test it. We picked it up hoping to avoid having to redo motion correction for an active dataset, but then got stuck in and have since redone it six ways to Sunday.

Cheers,
Yang

Sorry for the delay. Should work now.

Hi @DanielAsarnow

pyem newly git clone-d today.

Examplar particle data
# Reference particle (AngleRot, AngleTilt, AnglePsi, OriginX, OriginY)
-69.794205 65.145355 171.596817 -22.796064 12.710671

# Original csparc2star.py --flipy
69.794205 65.145354 171.596810 -22.796064 -12.710671

# Manual awk modification from S3c
110.206 114.855 -171.597 -22.796064 -12.7107

# Updated csparc2star.py --flipy 
110.205795 114.854646 -171.596810 -22.796064 -12.710671

# Updated csparc2star.py --flipy (# line 95)
110.205795 114.854646 -171.596810 -22.796064 12.710671

bfactor.eps from BPP looks good now. With --flipy as coded, refinement of shiny particles is working as well as in S3a/c. Relative to that, commenting out line 95 led to a slightly worse outcome–still better than unpolished.

I think the inversion of OriginY is the correct transformation. The fact that BPP didn’t blow up completely with the wrong Y-shift may be down to the translation errors being interpreted as, and subsequently absorbed into particle motion to a degree during tracking.

For reference, below is the rlnOriginYAngst distribution in this particular input particle data (pre-shiny).

Y-shift distribution

There still remains a disrepancy between PM- vs RC/MC2-derived data, with the latter producing better results. This could be down to the first-frame issue. It’s probably worth comparing with different datasets in order to corroborate.

Cheers,
Yang

Have you tested if recentering the coordinates has any impact? With star.py --recenter for example…BPP itself doesn’t do any recentering.

I think you’re right about some shift errors being correctable during tracking. If you imagine bringing each frame of a particle into register with a reference projection, then it could be that this represents a physical motion trajectory from frame to frame, or random shift errors on each frame. I think the extent to which it is feasible to align individual frames of a given particle determines the importance of a good initial trajectory. For example a ribosome could be alignable regardless of the initial trajectory while a smaller particle might require the restraints from other nearby particles or a very good initial trajectory.

Last question - if you run polishing iteratively, does the 2nd run based on PM catch up to the RC/MC2 results?

Thanks again for testing, it’s very satisfying to figure out what was wrong with my transformation before.

@leetleyang Also to clarify something, the PM trajectories are recentered to the first frame when they are exported. If you plot the MC2 and PM trajectories you’ll see they are the almost the same. In MC2 as well, the initial reference is actually the unaligned sum. It’s really just that the final results are written out relative to the first frame.

When you say the results are still worse than scenario 1 - how close are they? Is it attributable to random variation between refinements (if you repeat a refinement there can be ~0.1Å variations in the nominal resolution)? Do you do a 2D cleaning after polishing? It doesn’t mask hot pixels like MC2/RC/PM do.

Hi @DanielAsarnow

Indeed, I saw from the code that the PM trajectories are readjusted to the first frame for reference.

Also, I have since done some of the reading that I should have done before thinking out loud(!), and found out that in the absence of the -FmRef flag, MC2 uses the central frame as its reference–not the first frame. I’m not entirely sure what RC does but I assume the same given their shared provenance? Also, that BPP takes this into account when interpreting particle coordinates?

From the MotionCor2 manual:
The output and log files list the shifts relative to the first frame. However, the correction is
relative to the central frame by default. “-FmRef” is a switch that allows to choose the
reference either the central frame by giving it a non-zero value or the first frame by setting
it zero.

And for completeness, from this post, I got the impression that in PM all frames get shifted to some degree, i.e. there is no strict reference frame.

To clarify my thought process at the time, for others reading. Due to different frames of reference, the averaged micrographs from one will not line up perfectly with the other’s. So coordinates (x,y) from one may coincide with (x+a,y+b) from another. This amounts to errors in OriginX and OriginY going between the two. This is independent of the trajectories. Not sure if this rationalisation is how it actually works.

I take your point. gsFSC=0.143 is within 0.1Å. The caveat being that this particle set is limited by conformational heterogeneity so it’s been difficult to gauge. I would normally chalk it down to random variation and think nothing more of it, save that RC/MC2 has consistently (n=4 separate BPP+refinement runs) given slightly better FSC (across the resolution range, not just at 0.143) vs PM (n=3), so I thought to make a note of it. Internally, the RC/MC2 refinements have been rather consistent. As have the PM refinements. The ±line95 test was also within 0.1Å.

Another dataset where BPP has had more potential to improve the map may be a more suitable test case.

No, I haven’t in this case. The particle stack was imported directly into cryoSPARC for refinement. I thought less manual manipulation would be a fairer comparison.

EDIT: It’s probably worth mentioning that as a matter of routine, I perform preprocessing without a defect file.

I can run these tests. But to answer your question, I haven’t experimented with recentering. Also, a 2nd bootstrapping run would be using an improved reference. It may give a better result based on that alone. I should reiterate again that the choice of this particular dataset is a bit unfortunate. It is what we had available. I’ll check to see if one of our other EMPIAR cases gave a better return from BPP.

Cheers,
Yang

I’ve run this test following your improved flipy transformation.

In short, with this EMPIAR data, bootstrapping while maintaining the original coordinates doesn’t seem to narrow the gap to the RC/MC2 workflow.

Scenario 3: CS to RELION
  1. d) tif movies Patch Motion-corrected in CS.
    → pre-shiny particle coordinates from S1 Y-inverted and used to extract from csMicrographs (Recenter using aligned shifts disabled).
    → Particles refined against original reference (3.1-3.2Å).
    csparc2star.py --movie to generate csMicrograph and motion star files.
    cpsarc2star.py --flipy particle conversion for polishing.
    half_maps and mask_fsc_auto used as is for postprocessing.
    shiny_1 particles yield better results than unpolished, but worse than RC/MC2 workflow.
    csparc2star.py particle conversion for second polishing.
    star.py micrograph/coordinate concatenation with shiny_1.
    → improved half_maps and mask_fsc_auto used as is for postprocessing.
    shiny_2 particles yield very similar results to shiny_1.

At the outset, I assumed the gap to the RC/MC2 workflow would narrow on the second run through. However, I wonder if the purported shift errors could disproportionately affect the alignment of the earlier frames, which also carry more of the information.

EDIT:
In contrast, a second polishing round in the RC/MC2 workflow further improves the FSC (~2.7Å).

EDIT2:

I’ve run the recenter test outlined below. It gave almost identical results to S3d (above).

Scenario 3: CS to RELION
  1. e) tif movies Patch Motion-corrected in CS.
    → pre-shiny particle coordinates from S1 Y-inverted and used to extract from csMicrographs (Recenter using aligned shifts disabled).
    → Particles refined against original reference (3.1-3.2Å).
    csparc2star.py --movie to generate csMicrograph and motion star files.
    cpsarc2star.py --flipy particle conversion for polishing.
    half_maps and mask_fsc_auto used as is for postprocessing.
    shiny_1 particles yield better results than unpolished, but worse than RC/MC2 workflow.
    csparc2star.py particle conversion for second polishing.
    star.py micrograph/coordinate concatenation with shiny_1.
    star.py --recenter on particles star file prior to polishing.
    → improved half_maps and mask_fsc_auto used as is for postprocessing.
    shiny_2 particles yield very similar results to shiny_1.

Cheers,
Yang

Representative FSC curves from a single round of polishing, validated using a common mask.

Cheers,
Yang

@DanielAsarnow, @jcoleman I worked around this issue by adding the line:

data_general = data_general.loc[:,~data_general.columns.duplicated()]

immediately before line 71 of csparc2star.py as suggested in this stack overflow. I’ve found @DanielAsarnow’s issue on the github so will suggest this fix there too.

@DanielAsarnow, our relion_motion_refine runs kept on hanging and we tracked down the issue to the tag uscfid in the micrograph star files that caused a seg fault which manifests as a job hanging when using the GUI. Relion should intercept tags isn’t not familiar with and gracefully ignore them so I’ll be submitting a bug report to the relion Github but thought you might want to know as well. Can share commit versions of pyem and relion if you’re interested. Deleting these tags from the star files caused everything to run smoothly.

I accidentally changed a default value that caused those extra fields to be emitted. It should be fixed now, sorry about that.

No worries, I left a bug report on the Relion Github and Takanori has come back saying that the underlying problem is the duplicate entries in the star file header (eg. there are two rlnvoltage tags). This is not properly handled in Relion (Takanori has bigger fish to fry) and it causes a SegFault on reading the UCSFID tag.

Ah, OK. Fix incoming.

(I mean, your fix. Thanks!)

Hi Daniel,

Firstly, thanks for this great tool. I’m trying to follow the instructions here, using Method 1:

I’m just curious on what the job types are in your example. J640 seems like a refinement, which is straightforward enough, but J616 I can’t quite figure out what it is. There needs to be both a passthrough exposures, and exposures file. Import Movies doesn’t give those files, Patch Motion Correction gives passthrough_micrographs.cs and micrographs.csg (I don’t think I can use .csg though?). Patch CTF also gives those two as well, as does Curate Exposures.

Is this something that changed in CryoSPARC 4? Should I just use any job that has the passthrough_exposures.cs and exposures.csg files that I want (i.e. the motion and CTF corrected ones?)

Thanks

That job was an exposure group utilities job, but it doesn’t (or shouldn’t) matter what kind of job it is, or what files are there specifically, so long as they contain metadata for motion-corrected movies. They don’t have to be from the same job, either, as long as they are for the same movies.

Also, only .cs files are relevant for data export.

1 Like

Dear Daniel and Yang,

sorry for lame questions, I’m trying to navigate this thread/pyem wiki to actually polish my CS particles and I get very confused. So far I tried “CS to RELION” path (1):

->I have csMicrographs from patch motion and CS refinement (starting from .tiff K3 movies imported to CS);
->I generated .STAR file with csparc2star.py with --flipy (using --inverty as described in a wiki gave the wrong coordinates) using CS patch motion and CS patch ctf outputs
->I ran relion preprocess according to the wiki page using original CS map and half-maps
->ran Polishing which seemingly worked OK giving no obvious errors. However, after import to CS & refinement I observed no improvement in resolution/FSC (at the same time, local motion correction in CS gave a consistent ~0.1-0.15 A improvement). Training polishing using same particles/maps did not improve the results. At the same time, it did not degrade the resolution either.

A couple of questions for this path:

-Does it indicate polishing simply is not beneficial for my data, and I need to optimise polishing or I might need another round of polishing using the same procedure?

-Do you suggest I need to re-run polishing using relion_reconstruct-derived map rather than CS map?

Alternatively, I thought that I should simply run everything in Relion. I ran Relion MC on original movies to get micrographs_corrected.star. Do I now need to re-extract particles in Relion, reconstruct (or refine?) in Relion and run Polishing? What option should I use for csparc2star.py to generate star file for particle extraction i.e. flipy or inverty? Also, currently I manually edit the star files with sed to remove the UIDs/change optics groups and I wonder if there is a better way of doing it?

Many thanks,
Dmitry

Hi Dmitry,

I’m an end-user like yourself, so I’m sure @DanielAsarnow will be able to explain better. However, to add my two cents.

As of a few commits ago, inverty is enabled by default. Currently, an explicit --inverty re/un-inverts that inversion. As you’ve found in this scenario, running csparc2star.py without an explicit flag is appropriate.

Admittedly, this thread gets a tad confusing.

To summarise where I had gotten to, my suspicion is that there is a difference in particle location between averaged micrographs generated using CS Patch Motion vs MC2/RELION. This manifests as variable errors in reference shifts when transferring from one source to another. If motion correction was performed in CS but using the MC2 wrapper, then this issue doesn’t seem to apply.

I think polishing with appropriate reference shifts is worth trying in your case. Given your input sources, one possible route forward is to:

  1. Repeat motion correction in RELION using MC2 or RELION’s implementation (which you’ve now done).
  2. Re-extract particle images in RELION using your existing particle star file.
  3. Import particle stacks into CS and repeat 3D-refinement.
  4. Run csparc2star.py, this time with an explicit --inverty flag but without --flipy.
  5. Use the half-maps and mask from CS as-is for post-processing.
  6. Perform polishing.

The reason for repeating 3D-refinement after re-extraction in step 3 is to absorb the initial discrepancies in X/Y-coordinates into the reference X/Y-shifts.

Assuming a static dataset, i.e. collection is finalised, my regular routine usually starts with motion correction in RELION before importing the micrographs into CS for curation. This seems to require the least amount of manipulation downstream. Although, some have noted differences in the robustness of CTF estimation when working with e.g. dose-weighted vs non-dose-weighted, or float16 vs float32 inputs. I guess it’ll be dataset-dependent.

That’s what I do as well, albeit in vim.

I hope that helps.

Cheers,
Yang

Hi Yang,

many thanks for explanation & help, I was able to follow that and do some more testing. In short:

  • repeated rounds of polishing using cryoSPARC patch MC did not yield any improvement (~2.7 A)
  • refinement with Relion MotionCor initially gave lower resolution than CS PMC (~ 3 A); I used this job for polishing using your instruction and that gave the best resolution (2.47 A)
  • second round of polishing using 2.47 A refinement and shiny particles made things WORSE, as did any attempts to further improve local CTF
    -this is very close to my result with cryoSPARC-only pipeline (Patch motion corr followed my Local motion Corr) - 2.55 A. So not so much gain after all, but FSC looks better and I got some good experience converting files back and forth! :crazy_face:

Best,
Dmitry