Adjust per-particle defocus of subparticles in volume alignment tools?

Hi,

Is there any prospect of incorporating adjustments for defocus gradients across a particle when defining/extracting subparticles (e.g. see section 3.1.1. here: https://www.sciencedirect.com/science/article/pii/S0079610720300390?via%3Dihub)?

This would be very helpful when extracting smaller sub-particles that are on the periphery of a larger particle, e.g. a viral capsid, and the appropriate adjustments ought to be possible to calculate given the consensus refinement of the whole assembly (assuming that it has been refined with the correct hand).

Cheers
Oli

1 Like

Would be interesting to compare to Local CTF…

1 Like

Yes - also because we have had inconsistent results applying local CTF to subparticles - sometimes they seem to just go wildly off and hit the bounds of the search range. I also wonder in some cases whether the local defocus search would benefit from some kind of prior or restraint - e.g in some cases I imagine a bimodal gaussian would do the trick, as particles tend to cluster at either interface, with less in the middle.

1 Like

Clustering at the boundaries happens for me with some samples because the defocus gradient across the particles in the micrograph is very large (fun times with large subjects!) but I’ve not seen it happen with smaller things, even when using very focussed refinements (by “smaller” I mean things <4MDa…)

What happens if you tighten up the search range? Rado mentioned once that narrowing the search range helped avoid it searching out local minima.

I’ve noticed that a particle going off wildly (both in CryoSPARC and RELION) is a fairly good indicator of a low quality particle which stands a better than average chance of being junk if checked visually.

1 Like

In this case, I am talking about a large particle, and the particles concerned are not junk (I separated them out and looked at them). Tightening up search range didn’t seem to help (still hits the boundaries, just for more iterations).

Interestingly, if I make the box very tight (really tight!) around the particle, local CTF search is stable - if I make it a bit larger, I get a high proportion hitting the boundaries of the search range. No idea why.

1 Like

That’s really interesting, yes!

Did you enable Ewald sphere correction during defocus refinement? I found for one sample that helped with alternating defocus shifts when two sides of the capsid overlapped in the micrograph (otherwise they would “flip” occasionally).

How large, out of interest? :smiley:

1 Like

Yes, this was with Ewald sphere correction enabled - only difference was box size

1 Like

I’ve had an odd thought; what’s the angpix and box size (both times)? Just wondering if angpix is fairly large whether it’s struggling to sample the CTF sufficiently well.

700px & 1.13 Ă… (when it worked), 768px & 1.10 Ă… (when it failed)

One other difference is that the particles from the failed job were first extracted at full scale, then downsampled using the Downsample job… I shouldn’t imagine that would make a difference but maybe?

Um. Not sure. Rarely use downsample job specifically, usually downsample during extraction for early work then directly reextract for unbinned. Likewise don’t see why that would cause issues though.

Striking out for bright ideas now though, 1.1 angpix shouldn’t cause appreciable CTF issues for defocus as its not particularly extreme… Was wondering if you were working at 4+ angpix. Sorry, will sleep on it. Maybe something else occurs.

Fascinating. Change is small but has such a big impact. Although reminds me of issues one of our students was having. Huh. I’ll run his CTF fits through a simulation tomorrow, see if what you’re experiencing could explain his trouble. Curious. Had thought he had grouped beam tilts sub-optimally given earlier results. Worth checking. Thanks!

1 Like

Does indeed seem to be something about downsampling… puzzling…

Using same 3D alignments, same initial CTF, just varying particle image input, reconstructing then running local CTF:

700px (downsampled from 960px during extraction):

768px (downsampled from 960px during extraction - you can see many particles hitting search boundaries):

768px (downsampled from 1024px after extraction - you can see many particles hitting search boundaries):

768px (downsampled from 1024px during extraction - you can see many particles hitting search boundaries):

1024px (as previous, just not downsampled):

So it seems like downsampling to 768px gives weird results in local CTF search, regardless of the initial box size. Odd… @mmclean @rbs_sci any thoughts?

It would be useful in cases like this if there were an easy way to select & inspect the outliers - some kind of particle sorting/curation tool would be very welcome.

1 Like

That’s very interesting. No change in defocus profile except the outliers. So uniform error across the whole dataset…? What’s the resolution of the reconstruction (whole object and focussed)?

I wonder if the particles exhibiting outliers are on high motion micrographs, or regions of micrographs with greater motion? If so, perhaps drift is not fully compensated and the CTF fit is more difficult…

Agreed. And without cryosparc-tools. Perhaps with an FFT option as well. The FFT view of particles in cisTEM can be very useful.

1 Like

So uniform error across the whole dataset…?

Pretty much. And in the past I tried to remove the particles that were outliers (using some careful awking) - but then more appeared when I redid local CTF…

What’s the resolution of the reconstruction (whole object and focussed)?

When local CTF behaves, it is 3.3 initially, 2.9 after per particle defocus, 2.5 after beam tilt refined in image shift groups.

Very odd. If it’s a reoccurring “quirk” even after having removed the outliers, that eliminates any possible issue with the data itself (in my opinion) and makes it entirely a (faulty) function of the software somehow.

Nice resolution improvement, though!

I’ve been having issues with recent builds of RELION failing to calculate per-particle defocus on many micrographs. Older builds seem fine. Been driving me crazy.

1 Like

In the meantime, might there be a way to do this using CS-tools or pyem (maybe something in subparticles.py)? @rposert @DanielAsarnow…?

EDIT: It seems like there is an option in subparticles.py ("--adjust-defocus", help="Add Z component of shifts to defocus") but I cannot get it to run - trying on a star file generated from a symmetry expansion job using csparc2star.py, I get the following error (where size & x,y,z are appropriate values for this sample):

subparticles.py particles_expanded.star --boxsize size --target x,y,z --recenter --adjust-defocus defocus_adjusted.star
Traceback (most recent call last):
  File "/usr/local/bin/subparticles.py", line 194, in <module>
    sys.exit(main(parser.parse_args()))
  File "/usr/local/bin/subparticles.py", line 45, in main
    df = star.parse_star(args.input, nrows=1)
  File "/usr/local/pyem/pyem/star.py", line 500, in parse_star
    augment_star_ucsf(df, inplace=True)
  File "/usr/local/pyem/pyem/star.py", line 632, in augment_star_ucsf
    df[[UCSF.IMAGE_INDEX, UCSF.IMAGE_PATH]] = \
  File "/usr/local/envs/pyem/lib/python3.10/site-packages/pandas/core/frame.py", line 3947, in __setitem__
    self._setitem_array(key, value)
  File "/usr/local/envs/pyem/lib/python3.10/site-packages/pandas/core/frame.py", line 3989, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/usr/local/envs/pyem/lib/python3.10/site-packages/pandas/core/indexers/utils.py", line 392, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

The input particle star file has all the appropriate columns as far as I can tell:

data_particles

loop_
_rlnImageName #1
_rlnMicrographName #2
_rlnCoordinateX #3
_rlnCoordinateY #4
_rlnAngleRot #5
_rlnAngleTilt #6
_rlnAnglePsi #7
_rlnOriginXAngst #8
_rlnOriginYAngst #9
_rlnDefocusU #10
_rlnDefocusV #11
_rlnDefocusAngle #12
_rlnPhaseShift #13
_rlnCtfBfactor #14
_rlnOpticsGroup #15
_rlnRandomSubset #16
_rlnClassNumber #17

I get the same error if I run it on a particle star file without symmetry expansion, and specify the symmetry using the --sym flag.

@olibclarke That option does what you want, I think the problem is just that you can’t mix positional and non-positional arguments. The positional args have to all be at the beginning or the end.

I tried that but it still failed with the same error, whether I had the input/output args at the start or the end…

EDIT: Okay, I think I found the issue @DanielAsarnow - it only works if I explicitly specify the pixel size with --apix - otherwise it fails with this error, even if I specify the box size.

Hmm - but now if I import this particle stack into cryosparc with the source mics connected, I get this error:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 115, in cryosparc_master.cryosparc_compute.run.main
  File "/home/exx/cryosparc/cryosparc_worker/cryosparc_compute/jobs/imports/run.py", line 466, in run_import_particles
    assert len(output_errors) == 0, "The following outputs were produced but not expected: " + ", ".join(output_errors)
AssertionError: The following outputs were produced but not expected: location

EDIT: Ok, got past that error by disabling strict output checking, and now it seems to be behaving, but the particle coords don’t seem to quite match the micrographs… the particles were downsampled, perhaps that is causing some issue with coordinate conversion in subparticles.py? @DanielAsarnow if the particle pixel size is different from the micrograph pixel size, will this cause a problem in the output coords?

EDIT: Ah got it! Needed --inverty in csparc2star.py. Working now! Only remaining issue is that I lose the per-group beam tilt etc upon import/export… I wish there was a more granular way to replace CTF information using the expanded inputs, e.g. a separate slot for defocus, tilt, trefoil etc… @rposert is there some way to copy over just the tilt/trefoil values using CS tools??

EDIT: Sort of working but not quite. The particle positions are still off when comparing to the original subparticle extraction in csparc - by perhaps 1/3 of the new subparticle box size, in a manner that seems like the z-value of the target is off somehow. Confirmed - the Z-value is offset by ~1/2 the subparticle box, but x and y are fine. Calculating the offset after extracting the mis-centered subparticles and adding it to the z-value of the target in subparticles.py (which gives a Z value larger than the box size in Å!) gives correctly centered subparticles. That does make me wonder whether the applied offsets to subparticle defocus will be correct, though…

EDIT: The resolution gets worse with --adjust-defocus in subparticles.py, which makes me think that the applied defocus changes are wrong - perhaps inverted? The hand of the initial reconstruction is correct, as is the reconstruction of the subparticles, so that isn’t the issue. Hmm. Tried using the --invert flag in subparticles.py, but the resolution is still worse than without --adjust-defocus (3.46 vs 3.29 Å). Resolution improves with per-particle CTF refinement, so I expect it ought to also improve when using --adjust-defocus instead, if the adjustments are correct.

RELION and CryoSPARC seem to handle the Zernike coefficients differently - this has been a source of serious frustration recently but I’ll post about it when I get time because I don’t have enough of an understanding on this point to know how to fix it (or work around it).

It’s the difference between CS-optimised CTF parameters giving me 3.7 Ang, while the same stack in RELION gives me 6.4 Ang, and a RELION-optimised CTF parameter set giving me 3.3 Ang while in CryoSPARC I get 4.6 Ang with a vicious dip in the FSC.

1 Like

In this case it’s not even a matter of converting higher order params between relion & CS - just that you lose them on conversion to star, and I would like to re-associate them after importing back to CS, without altering the modified defocus values

Hi @olibclarke, @rbs_sci! Thanks for this interesting conversation!

Box size

Thank you for bringing the issues which seem to arise with the specific box sizes to our attention! We’ll have to think about this one a bit…

Outlier filtering

I’ve recorded this feature request — and apologies for the cs-tools suggestion below :slight_smile:

Copying higher-order CTF params

Higher-order CTF params live in the following fields:

  • ctf/cs_mm
  • ctf/tilt_A
  • ctf/trefoil_A
  • ctf/tetra_A

Assuming that your subparticles retain their exposure group IDs, you can therefore just copy over the correct values into the field. Here’s a cryosparc-tools script to do this (you’ll need to save as external result afterword, of course).

from cryosparc.tools import CryoSPARC
import json
import numpy as np
from pathlib import Path
with open(Path('~/instance-info.json').expanduser(), 'r') as f:
    instance_info = json.load(f)

cs = CryoSPARC(**instance_info)
assert cs.test_connection()

project_number = "P345"
workspace_number = "W5"
job_number = "J1155"

project = cs.find_project(project_number)
job = project.find_job(job_number)
original_particles = job.load_output("particles")

# obviously, you'll be using a different dataset here,
# but with the same exposure group IDs
subparticles = original_particles.copy()

original_particles.filter_prefix("ctf")

# arbitrarily take single row from each exposure group to access their higher-order CTF params
# we can do this because we know that each exp group has identical values for these params
exposure_groups = list(np.unique(original_particles["ctf/exp_group_id"]))
exposure_group_rows = {
    exp_group: original_particles.query({"ctf/exp_group_id": exp_group})[0]
    for exp_group in exposure_groups
}

# you can change what is copied over here
higher_order_param_colnames = [
    "ctf/cs_mm",
    "ctf/tilt_A",
    "ctf/trefoil_A",
    "ctf/tetra_A",
]

higher_order_param_values = {}
# result: {param_name: {exposure_group: value}}
for colname in higher_order_param_colnames:
    higher_order_param_values[colname] = {
        exp_group: exposure_group_rows[exp_group][colname]
        for exp_group in exposure_groups
    }

for hop_name in higher_order_param_colnames:
    updated_values = [
        higher_order_param_values[hop_name][exp_group] for exp_group in subparticles["ctf/exp_group_id"]
    ]
    subparticles[hop_name] = updated_values

There’s likely a more elegant way to do it, but that’ll work!

Hope that helps!

2 Likes