Experimental support for Relion's Bayesian polishing in csparc2star.py

Moritz · January 17, 2023, 1:46pm

Has there been a fix for the duplicate axis error? I am running into the same now:

raise ValueError("cannot reindex on an axis with duplicate labels")

ValueError: cannot reindex on an axis with duplicate labels

Using pyem_20221207, input file is an ‘accepted exposures’ from cryoSPARC live.

Happy to send .cs file or full error logs if needed

Adrian · January 20, 2023, 10:56am

Hello, I did the transformation between the cs file to star file in order to do Bayesian polishing in Relion, however, the global CTF refinement results: Beam tilt, beam trefoil and tetrafoil is not present in the output star file, among those CTF refinement parameters the only one passed to the star file is the refined spherical aberration.
How can I solve this? Is the only solution to repeat the global CTF refinement after bayesian polishing?

YoungDay · February 24, 2023, 8:56pm

Hi Max,

I am encountering the same issues as you. Have you figured out a solution? Thanks.

Best,
Young

leetleyang · March 2, 2023, 5:20pm

Hi @DanielAsarnow

Dabbling with this at the moment.

Just to check that we’re generating the appropriate motion data files. Our <individual_movie>.star files contain data_general and data_global_shift tables and nothing else. Is there meant to be more information, e.g. data_local_motion_model or data_local_shift, or are the global shifts all that is pertinent to Polishing anyway?

EDIT: source data comes from Patch Motion-corrected movies.

Cheers,
Yang

DanielAsarnow · March 5, 2023, 2:23am

@leetleyang Relion can use either whole frame or patch trajectories to initialize the search in BPP. We’re doing the former, so motion model is 0 and we only need data_global_shift. In principle we could also set motion model to 1 and export patch trajectories to data_local_shift.

If you use Relion to run MotionCor2 then only global shifts are used as well. Local shifts are only available when using Relion’s own implementation of the MotionCor2 algorithm. Takanori has said multiple times he believes the initialization is not important (though personally I believe it does matter for some datasets). If you are interested you can compare values from a Relion job and cryoSPARC patch motion local trajectories and we may be able to add the local motion initialization.

leetleyang · March 5, 2023, 3:27pm

Hi @DanielAsarnow

Thanks for the clarification. That’s clear.

We’re attempting to compare several workflows now. Admittedly, our test dataset, with 3-10 coordinates to track per micrograph, isn’t ideal. I understand per particle tracking is more robust the more particles there are to track.

Scenario 1: CS to RELION to CS to RELION

tif movies Patch Motion-corrected, (blob)-picked and processed in CS to obtain particle stack (3.1Å); tif movies motion-corrected with RelionCor in parallel.
→ csparc2star.py particle conversion for re-extraction from rlnMicrographs.
→ rlnParticles re-refined in CS against original reference to regenerate ref->image relationship (3-3.2Å).
→ csparc2star.py --inverty conversion for polishing with RelionCor outputs.
→ half_maps and mask_fsc_auto used as is for postprocessing.
→ shiny particles yield comparable/better results (~2.9Å; better FSC at medium resolution).

Scenario 2: CS to RELION

tif movies Patch Motion-corrected and processed in CS to obtain particle stack (3.1Å).
→ csparc2star.py --movie to generate csMicrograph list and motion star files.
→ csparc2star.py --flipy particle conversion for polishing using csMicrograph and motion files.
→ half_maps and mask_fsc_auto flipped in Z then in Y using relion_image_handler for postprocessing.*
→ shiny particles yield worse results (4-5Å).
(*although probably incorrect, also attempted polish against maps/masks that hadn’t been Z+Y-flipped with similar outcome.)

Scenario 3: CS to RELION (+rlnMotion)

tif movies Patch Motion-corrected, (blob)-picked and processed in CS to obtain particle stack; tif movies motion-corrected with RelionCor in parallel.
→ csparc2star.py --flipy particle conversion for polishing using RelionCor outputs.
→ half_maps and mask_fsc_auto flipped in Z then in Y using relion_image_handler for postprocessing.*
→ shiny particles yield worse results (4-5Å).
(*although probably incorrect, also attempted to polish against maps/masks that hadn’t been Z+Y-flipped with similar outcome.)

(Caveats)

Scenario 1 (a.k.a. the safe way) was performed to establish a baseline. In general, the trajectories therein appear reasonable for the first half of the frames and then occasionally become a bit suspect in the latter half. Worth reiterating that this is not the best test case. The shape of the overall polishing B-factor plot looks reasonable, however. Smooth decline from 0 to -150. Polishing was initiated with the default parameters rather than a trained set. With EMPIAR and other data, we’ve rarely found the parameters to make a significant difference to the outcome, although this cannot be fully discounted as a factor in this case.

Question. Scenario 3 was attempted as a sanity check and also to test ±local motion initialization (compared to Scenario 2). From a workflow perspective, is there a reason why this might have tripped up? Comparing the data_global_shift tables from selected motion star files (RelionCor vs csparc2star), these seem to reflect similar shifts in magnitude and direction.

Also, after wrapping my head around the conventions in play, --inverty (or rather, UNinverty?) is indeed unfortunate, as you say.

def cryosparc_2_cs_particle_locations(cs, df=None, swapxy=True, invertx=False, inverty=True):
...
  if inverty:
    # cryoSPARC coordinates have origin in "bottom left" so inverting Y is default for Relion correctness
    # (and therefore also for Import Particles). However, cryoSPARC Patch Motion flips images physically
    # vs. SerialEM, Motioncor2 doesn't, so "inverting twice" (not inverting) is required if switching.
    df[star.Relion.COORDY] = 1 - df[star.Relion.COORDY]
...

EDIT: I shall attempt the comparisons again on an EMPIAR dataset I have lying around.

Cheers,
Yang

DanielAsarnow · March 6, 2023, 12:01am

My code comment is wrong actually - it is cryoSPARC that doesn’t flip the TIFs.

I think that differences in local motion patch interpolation between different motion correction is somehow scrambling the relationship with the refined particle shifts. So far I’ve only got a resolution enhancement from BPP if I re-extract and re-refine the particles based on the Relion or MotionCor2 motion correction. Going directly from csparc should work in principle but it seems something isn’t right. I’m reasonably sure the conversion convention I’m using is right, but none of the other options worked either.

Edit to clarify:
I’ve done Scenario 1 and 3 successfully, 3 also works if you add a re-extract and re-refine from the Relion MotionCorr job.

leetleyang · March 6, 2023, 12:36am

This makes sense. Insightful point. Thanks for clarifying your own experience as well.

Cheers,
Yang

leetleyang · March 6, 2023, 9:05pm

Hi @DanielAsarnow

Wondering if there’s a quick way of getting around an apparent bug in the CS MC2 wrapper.

Looking in the .cs file, while 'movie_blob/psize_A' reflects 0.83, 'rigid_motion/psize_A' appears to be truncated to 0.. At least I think that’s why csparc2star.py --movie is returning divide_by_zero errors. micrograph_blob/vmin and micrograph_blob/vmax seem to be similarly truncated, so I think it’s a general parsing issue.

/home/ylee/software/pyem/pyem/metadata/cryosparc2.py:263: RuntimeWarning: divide by zero encountered in true_divide
  traj /= cs['rigid_motion/psize_A'][i]  # Looks right.
/home/ylee/software/pyem/pyem/metadata/cryosparc2.py:263: RuntimeWarning: invalid value encountered in true_divide
  traj /= cs['rigid_motion/psize_A'][i]  # Looks right.

Do you normally run MC2 outside of CS? The wrapper doesn’t support the -LogDir option in ≥1.4.6 leaving no convenient way of issuing -OutStar 1 from within CS.

EDIT:
Never mind. Easier to edit the .cs file itself.

EDIT2:
I see. csparc2star.py --movie attempts to correct for the axis inversion, which isn’t appropriate for MC2 inputs.

Cheers,
Yang

leetleyang · March 8, 2023, 12:37pm

Hi @DanielAsarnow

Some further investigations. I repeated some tests with EMPIAR-11350 to see where things may be failing and which workflows are possible.

Scenario 1: RELION to CS to RELION (baseline)

tif movies motion-corrected with RelionCor.
→ rlnMicrographs (blob)-picked and processed in CS to obtain particle stack (3.1-3.2Å).
→ csparc2star.py --inverty particle conversion for polishing with RelionCor outputs.
→ half_maps and mask_fsc_auto used as is for postprocessing.
→ shiny particles yield better results (~2.8-2.9Å; better FSC at medium resolution).

Scenario 2: MC2(wrapper)/CS to RELION (sanity check)

tif movies motion-corrected with MotionCor2 within CS.
→ pre-shiny particle coordinates from S1 used to reextract from mc2Micrographs (Recenter using aligned shifts disabled).
→ Particles refined against original reference (3.1-3.2Å).
→ csparc2star.py --movie to generate mc2Micrograph and motion star files*.
→ cpsarc2star.py --inverty particle conversion for polishing.
→ half_maps and mask_fsc_auto used as is for postprocessing.
→ shiny particles yield better results (~2.8-2.9Å; better FSC at medium resolution).

*polishing attempted with positive and negative orientations of _rlnMicrographShiftX, both giving the same outcome, i.e. modelled particle trajectories and refinement FSC. Result seems insensitive to global_shift priors when 30-40 particles/mic to track?

Scenario 3: CS to RELION

a) tif movies Patch Motion-corrected in CS.
→ pre-shiny particle coordinates from S1 used to extract from csMicrographs (Flip mic in Y before extract enabled; Recenter using aligned shifts disabled).
→ Particles refined against original reference (3.1-3.2Å).
→ csparc2star.py --movie to generate csMicrograph and motion star files.
→ cpsarc2star.py --inverty particle conversion for polishing.
→ half_maps and mask_fsc_auto used as is for postprocessing.
→ shiny particles yield better results (~2.8-2.9Å; better FSC at medium resolution; maybe slightly worse than S1 and S2, which may or may not be stochastic, but at least it didn’t fail).

b) tif movies Patch Motion-corrected in CS.
→ pre-shiny particle coordinates from S1 Y-inverted and used to extract from csMicrographs (Recenter using aligned shifts disabled).
→ Particles refined against original reference (3.1-3.2Å).
→ csparc2star.py --movie to generate csMicrograph and motion star files.
→ cpsarc2star.py --flipy particle conversion for polishing.
→ half_maps and mask_fsc_auto ±(flipZ+Y) for postprocessing.
→ shiny particles, both ±(flipZ+Y) polish_reference, yield worse results (~4.5Å).

At least with this dataset, BPP seems largely insensitive to motion initialization inputs (see S2).

S3a gave an improvement from BPP, suggesting that Patch Motion in CS isn’t inherently the primary determinant when things blow up.

However, the differing S3a/b outcomes, depending namely on whether the mics are pre-flipped (a) or the coordinates/transformations are flipped later (b), suggests something isn’t quite right either with the way Relion is perceiving the --flipy-transformed particles, and/or the treatment of the reference?

Cheers,
Yang

DanielAsarnow · March 9, 2023, 2:46am

@leetleyang Thanks, this is great. Can you test 3b again with line 95 of csparc2star.py commented out? Like that:

95: # df[star.Relion.ORIGINY] = -df[star.Relion.ORIGINY]

leetleyang · March 9, 2023, 6:00pm

Hi @DanielAsarnow

Sure, I’ll give that a go.

In the mean time, I’ve tried to dig a bit into the discrepancy. I took the pre-shiny particle images from S3b, flipped them in Y (relion_image_handler --flipY), replaced the original stacks and refined against the original reference, in order to simulate the necessary shifts/transformations required if particles were sourced from Y-flipped mics.

Averaged across a few thousand particle images, the differences in rlnAngleRot, rlnAngleTilt, rlnAnglePsi, rlnOriginXAngst and rlnOriginYAngst, compared to right-side up particle images seemed best described by…

phi+180, 180-theta, -psi, XAngst (as is), -YAngst

Note, it’s similar to the transformation suggested by @sunch here, sans Y-shift inversion. That additional element is incorporated in --flipy. That one difference is perhaps why @olibclarke didn’t quite have success with just the Euler angle modifications at the time.

Scenario 3: CS to RELION

c) tif movies Patch Motion-corrected in CS.
→ pre-shiny particle coordinates from S1 Y-inverted and used to extract from csMicrographs (Recenter using aligned shifts disabled).
→ Particles refined against original reference (3.1-3.2Å).
→ csparc2star.py --movie to generate csMicrograph and motion star files.
→ csparc2star.py particle conversion for polishing.
→ application of the above .star file modifications using awk.
→ half_maps and mask_fsc_auto used as is for postprocessing.
→ shiny particles now yield comparable, if not identical results to S3a.

I’ve also been wondering about the slight difference in polishing outcomes when sourcing from RC/MC2 vs PM in CS.

The FSC is improved vs un-polished particles in both cases, but RC/MC2 has been giving slightly better FSCs. I wondered earlier if it was stochastic, but I’m beginning to think that it’s due to the fact that RC/MC2 and PM do not share the same reference frame–PM has no reference frame. Meaning if particles are extracted from csMicrographs, the particles won’t be exactly where they should be when Relion goes back to the movies. That is, unless csparc2star.py corrects for that?

Perhaps another reason to stick to RC/MC2, for consistency.

Cheers,
Yang

leetleyang · March 10, 2023, 2:52pm

Hi @DanielAsarnow

I gave this a go.

# Reference particle (Rot, Tilt, Psi, OriginX, OriginY)
-69.794205 65.145355 171.596817 -22.796064 12.710671

# Original csparc2star.py --flipy
69.794205 65.145354 171.596810 -22.796064 -12.710671

# Modified csparc2star.py --flipy
69.794205 65.145354 171.596810 -22.796064 12.710671

Polished against ±(flipZ+Y) maps/masks, unfortunately, both of which produced the same outcome as S3b.

Cheers,
Yang

DanielAsarnow · March 14, 2023, 8:50pm

Which step is the “reference” particle orientation from? What are the parameters for that particle in the other two scenarios?

I guess

[[1, 0, 0],
 [0, -1, 0],
 [0, 0, -1]]

might be the wrong transformation…or the idea about irreconcilable local motion differences is right (which it shouldn’t be, we should only have to care about the particle in the raw movie).

leetleyang · March 14, 2023, 9:04pm

S3b particles following refinement against original reference (i.e. 3.1-3.2Å map), then subject to csparc2star.py conversion without any additional flags.

And to confirm that the change to csparc2star.py has had the desired effect, the following two lines are from the same input particles but converted with a) the original --flipy, and b) --flipy with line 95 commented out.

Cheers,
Yang

DanielAsarnow · March 14, 2023, 9:15pm

OK, got it. Those changes are expected then. They’re just not right somehow.

leetleyang · March 14, 2023, 9:33pm

Beginning from the reference particle data from S3b, a favourable polishing outcome (see S3c above) can be attained with the following star file modifications:

(AngleRot)+180, 180-(AngleTilt), -(AnglePsi), OriginX, -(OriginY)

Apologies. I don’t know what the proper mathematical notation for it is.

At least with EMPIAR 11350, BPP seems largely insensitive to the fidelity of the motion initialization. However, what is different are the coordinates for where the particles are, even with --inverty properly applied.

RC/MC2 averages motion relative to the first movie frame. PM does not do this, as far as I can tell. So PM-derived coordinates will be off, possibly by quite a fair bit, when referencing the original movie frames.

A re-extraction from RC/MC2-derived micrographs probably allows this to be compensated for by absorbing the difference into the Euler angles and X/Y-shift transformations.

Cheers,
Yang

DanielAsarnow · March 14, 2023, 10:36pm

It’s the same transformation. Maybe the first-frame issue is key. I need to look at the polishing code again or maybe ask Takanori.

leetleyang · March 14, 2023, 11:02pm

Apologies. Is it meant to be the same transformation?

Comparing the outputs with and without --flipy, other than the change to OriginY, the flag appears to have the effect of 1) multiplying AngleRot by -1, which is quite different from +180°, 2) leaves AngleTilt untouched, which is also quite different from 180°-angletilt (except where the angle in question is 90°), and 3) leaves AnglePsi untouched as well.

Or perhaps is the transformation the same but relative to a reference that is flipped in Z and Y?

Cheers,
Yang

DanielAsarnow · March 15, 2023, 1:23am

Great questions. The transformation is applied to the rotation matrix derived from the particle Euler angles, and they’re not different, because of the degeneracy of the Euler angles. Here’s how I found the transformation (aside from the 1,-1,-1 signature “making sense” given what we are trying to do).

import numpy as np
from pyem import geom
r1 = np.array([-69.794205, 65.145355, 171.596817])
r2 = np.array([-69.794205 + 180, 180 - 65.145355, -171.596817])
m1 = geom.euler2rot(*np.deg2rad(r1))
m2 = geom.euler2rot(*np.deg2rad(r2))
tf = m1.dot(m2.T)
print(tf)
[[ 1.00000000e+00 -7.15192393e-17  1.10567260e-16]
 [-4.96956493e-17 -1.00000000e+00 -9.66247528e-17]
 [ 2.02288402e-16  5.73985227e-17 -1.00000000e+00]]

So we see tf has that [1, -1, -1] on the diagonal.

But now I realize that this transformation only does what we want if it’s used on the left of the particle matrix. My transformation code multiplies on the right - because that is what you need if you want the rotation to apply to the map. Here though we are considering the particle - and that is the inverse matrix!

Thus, the solution is to invert (transpose) the particle matrices before multiplying by our special 1,-1,-1 identity-ish matrix, then transpose them again afterwards.

Can you please update your pyem, and then repeat your test? We’ll need to check with and without that line 95 comment.