Copy CTF parameters to re-imported particles?

Hi all,

Consider the following scenario:

  1. I convert a particle.cs file (originating from a refinement job with grouped beamtilt refinement) to star, and select some subset of particles (e.g. using CryoSieve).
  2. Higher order CTF params are lost upon conversion.
  3. I reimport the subset of particles (using import particle stack).

I would like to be able to copy CTF params (beamtilt & trefoil) from the original particle set to the newly imported subset, but I cannot figure out a way to do this - does anyone have any suggestions? If it were possible to intersect based on the particle location (rather than UID or path), this should do the trick, but it seems the only options are UID or path…

Cheers
Oli

I posted a similar question before, but never dare to try the method suggested :sweat_smile:

I second that it would be very helpful to use particle location (_rlnCoordinateX and _rlnCoordinateY) to intersect the particle set.

1 Like

Hi Oli,

Do the original image stacks have UIDs tacked onto their filenames?

Assuming I’ve understood the use-case correctly…

If the conversion to star in the first instance was done taking care to remove the first set of UIDs from the image stack paths, I think an intersect based on path with Ignore leading UID enabled should work as you described? However, it may also require some finagling back to mrc from mrcs if this was required when manipulating the star file.

Sans removal, a reason intersect may not work is if Ignore leading UID removes only the second set of UID-prefixes added during Import (and the first set on Particle set A), while leaving the first set intact on the reimported comparison. This is my impression of its logic.

Cheers,
Yang

1 Like

Hi Yang,

Yes they do - and indeed, editing the star file so it refers to the mrc stacks rather than mrcs gets me part way there, thanks!

However, interesecting on path while ignoring UIDs is still not quite working as I would expect. I have set A, with ~141k particles, and set B, a subset of 52k particles (selected using CryoSieve). The intersection should contain the 52k particles from A that match those in B, as B is a subset of A.

Previously, when the filenames were mismatched, it gave an intersection of 0 (as expected - no overlap).

Now, with the mrc/mrcs confusion fixed, it is giving an intersection of 141k particles, which is impossible… thoughts?

Here is the log:

Operating on particle. Action is intersect. Slot that will be used during output is blob.

Inputted 141271 items as set A

Action is Intersect/Difference

Inputted 57982 items as set B

Intersection will use path field.

Computing set operations on blob/path..

Intersection: 141077 items

Using blob (and all passthrough fields) from set A for intersect output

A-minus-B: 194 items

Using blob (and all passthrough fields) from set A for A_minus_B output

B-minus-A: 0 items

Using blob (and all passthrough fields) from set B for B_minus_A output

Creating output files...

Done!

Done in 6.59s

--------------------------------------------------------------

Compiling job outputs...

Passing through outputs for output group intersect from input group particles_A

This job outputted results ['blob']

  Loaded output dset with 141077 items

Passthrough results ['ctf', 'location', 'alignments3D', 'pick_stats', 'ml_properties']

  Loaded passthrough dset with 141271 items

  Intersection of output and passthrough has 141077 items

Passing through outputs for output group A_minus_B from input group particles_A

This job outputted results ['blob']

  Loaded output dset with 194 items

Passthrough results ['ctf', 'location', 'alignments3D', 'pick_stats', 'ml_properties']

  Loaded passthrough dset with 141271 items

  Intersection of output and passthrough has 194 items

Passing through outputs for output group B_minus_A from input group particles_B

This job outputted results ['blob']

  Loaded output dset with 0 items

Passthrough results ['ctf', 'location', 'alignments3D']

  Loaded passthrough dset with 57982 items

  Intersection of output and passthrough has 0 items

Checking outputs for output group intersect

Checking outputs for output group A_minus_B

Checking outputs for output group B_minus_A

Updating job size...

Exporting job and creating csg files...

***************************************************************

Job complete. Total time 10.52s

Hi Oli,

Strange. I’ve just done a little test myself and noticed the same. It seems to be matching whole image stacks and ignoring the stack indices, which feels like a bug to me rather than the intended behaviour. Perhaps @wtempel can help shed some light?

Otherwise, I’m afraid cryosparc-tools, which I’m hopeless at, may be the only way to compare the two arrays.

EDIT:

Blindly cobbled together a cryosparc-tools workflow based on Example 9. The workflow assumes particles are reassociated with their exposures during import. This skirts the need for UID handling when comparing blob/path. Perhaps a starting point for what you want to do?

from cryosparc-tools import CryoSPARC
import numpy as np

cs = CryoSPARC(
    license="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    host="localhost",
    base_port=39000,
    email="ali@example.com",
    password="password123"
)

select_project = "P294"
select_workspace = "W2"
job_original_particles = "J31"
job_imported_particles = "J34"

original_particles = cs.find_job(select_project, job_original_particles).load_output("particles")
imported_particles = cs.find_job(select_project, job_imported_particles).load_output("imported_particles")

original_particles.add_fields(["intersect_field"], ["str"])
original_particles["intersect_field"] = [
    f"{r['location/micrograph_path']}.{r['blob/idx']}"
    for r in original_particles.rows()
]

imported_particles.add_fields(["intersect_field"], ["str"])
imported_particles["intersect_field"] = [
    f"{r['location/micrograph_path']}.{r['blob/idx']}"
    for r in imported_particles.rows()
]

intersection = original_particles.query({"intersect_field": imported_particles["intersect_field"]})

cs.save_external_result(
    select_project,
    select_workspace,
    intersection,
    type="particle",
    name="cryosieve_intersection",
    slots=["blob"],
    passthrough=(job_original_particles, "particles"),
    title="Cryosieved Subset",
)

Cheers,
Yang

3 Likes

@olibclarke @zhangrui_wustl: Our team agrees with @leetleyang’s finding that particle indices are (incorrectly) being ignored. We took a note of this issue and also find the approach in Yang’s script reasonable.

3 Likes

Hi all,

CryoSPARC v4.5 has been released which includes a fix for this bug; Particle Sets Tool now correctly uses both a particle’s file path and its index within the file to identify matches when intersecting by path. In addition, Particle and Exposure sets tools jobs now also include outputs for two versions of the intersection dataset; one where all output slots are copied from the A input, and the other where output slots are copied from the B input.

Best,
Michael

2 Likes