I convert a particle.cs file (originating from a refinement job with grouped beamtilt refinement) to star, and select some subset of particles (e.g. using CryoSieve).
Higher order CTF params are lost upon conversion.
I reimport the subset of particles (using import particle stack).
I would like to be able to copy CTF params (beamtilt & trefoil) from the original particle set to the newly imported subset, but I cannot figure out a way to do this - does anyone have any suggestions? If it were possible to intersect based on the particle location (rather than UID or path), this should do the trick, but it seems the only options are UID or path…
Do the original image stacks have UIDs tacked onto their filenames?
Assuming I’ve understood the use-case correctly…
If the conversion to star in the first instance was done taking care to remove the first set of UIDs from the image stack paths, I think an intersect based on path with Ignore leading UID enabled should work as you described? However, it may also require some finagling back to mrc from mrcs if this was required when manipulating the star file.
Sans removal, a reason intersect may not work is if Ignore leading UID removes only the second set of UID-prefixes added during Import (and the first set on Particle set A), while leaving the first set intact on the reimported comparison. This is my impression of its logic.
Yes they do - and indeed, editing the star file so it refers to the mrc stacks rather than mrcs gets me part way there, thanks!
However, interesecting on path while ignoring UIDs is still not quite working as I would expect. I have set A, with ~141k particles, and set B, a subset of 52k particles (selected using CryoSieve). The intersection should contain the 52k particles from A that match those in B, as B is a subset of A.
Previously, when the filenames were mismatched, it gave an intersection of 0 (as expected - no overlap).
Now, with the mrc/mrcs confusion fixed, it is giving an intersection of 141k particles, which is impossible… thoughts?
Here is the log:
Operating on particle. Action is intersect. Slot that will be used during output is blob.
Inputted 141271 items as set A
Action is Intersect/Difference
Inputted 57982 items as set B
Intersection will use path field.
Computing set operations on blob/path..
Intersection: 141077 items
Using blob (and all passthrough fields) from set A for intersect output
A-minus-B: 194 items
Using blob (and all passthrough fields) from set A for A_minus_B output
B-minus-A: 0 items
Using blob (and all passthrough fields) from set B for B_minus_A output
Creating output files...
Done!
Done in 6.59s
--------------------------------------------------------------
Compiling job outputs...
Passing through outputs for output group intersect from input group particles_A
This job outputted results ['blob']
Loaded output dset with 141077 items
Passthrough results ['ctf', 'location', 'alignments3D', 'pick_stats', 'ml_properties']
Loaded passthrough dset with 141271 items
Intersection of output and passthrough has 141077 items
Passing through outputs for output group A_minus_B from input group particles_A
This job outputted results ['blob']
Loaded output dset with 194 items
Passthrough results ['ctf', 'location', 'alignments3D', 'pick_stats', 'ml_properties']
Loaded passthrough dset with 141271 items
Intersection of output and passthrough has 194 items
Passing through outputs for output group B_minus_A from input group particles_B
This job outputted results ['blob']
Loaded output dset with 0 items
Passthrough results ['ctf', 'location', 'alignments3D']
Loaded passthrough dset with 57982 items
Intersection of output and passthrough has 0 items
Checking outputs for output group intersect
Checking outputs for output group A_minus_B
Checking outputs for output group B_minus_A
Updating job size...
Exporting job and creating csg files...
***************************************************************
Job complete. Total time 10.52s
Strange. I’ve just done a little test myself and noticed the same. It seems to be matching whole image stacks and ignoring the stack indices, which feels like a bug to me rather than the intended behaviour. Perhaps @wtempel can help shed some light?
Otherwise, I’m afraid cryosparc-tools, which I’m hopeless at, may be the only way to compare the two arrays.
EDIT:
Blindly cobbled together a cryosparc-tools workflow based on Example 9. The workflow assumes particles are reassociated with their exposures during import. This skirts the need for UID handling when comparing blob/path. Perhaps a starting point for what you want to do?
from cryosparc-tools import CryoSPARC
import numpy as np
cs = CryoSPARC(
license="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
host="localhost",
base_port=39000,
email="ali@example.com",
password="password123"
)
select_project = "P294"
select_workspace = "W2"
job_original_particles = "J31"
job_imported_particles = "J34"
original_particles = cs.find_job(select_project, job_original_particles).load_output("particles")
imported_particles = cs.find_job(select_project, job_imported_particles).load_output("imported_particles")
original_particles.add_fields(["intersect_field"], ["str"])
original_particles["intersect_field"] = [
f"{r['location/micrograph_path']}.{r['blob/idx']}"
for r in original_particles.rows()
]
imported_particles.add_fields(["intersect_field"], ["str"])
imported_particles["intersect_field"] = [
f"{r['location/micrograph_path']}.{r['blob/idx']}"
for r in imported_particles.rows()
]
intersection = original_particles.query({"intersect_field": imported_particles["intersect_field"]})
cs.save_external_result(
select_project,
select_workspace,
intersection,
type="particle",
name="cryosieve_intersection",
slots=["blob"],
passthrough=(job_original_particles, "particles"),
title="Cryosieved Subset",
)
@olibclarke@zhangrui_wustl: Our team agrees with @leetleyang’s finding that particle indices are (incorrectly) being ignored. We took a note of this issue and also find the approach in Yang’s script reasonable.
CryoSPARC v4.5 has been released which includes a fix for this bug; Particle Sets Tool now correctly uses both a particle’s file path and its index within the file to identify matches when intersecting by path. In addition, Particle and Exposure sets tools jobs now also include outputs for two versions of the intersection dataset; one where all output slots are copied from the A input, and the other where output slots are copied from the B input.