Copy CTF parameters to re-imported particles?

Hi all,

Consider the following scenario:

  1. I convert a particle.cs file (originating from a refinement job with grouped beamtilt refinement) to star, and select some subset of particles (e.g. using CryoSieve).
  2. Higher order CTF params are lost upon conversion.
  3. I reimport the subset of particles (using import particle stack).

I would like to be able to copy CTF params (beamtilt & trefoil) from the original particle set to the newly imported subset, but I cannot figure out a way to do this - does anyone have any suggestions? If it were possible to intersect based on the particle location (rather than UID or path), this should do the trick, but it seems the only options are UID or path…

Cheers
Oli

I posted a similar question before, but never dare to try the method suggested :sweat_smile:

I second that it would be very helpful to use particle location (_rlnCoordinateX and _rlnCoordinateY) to intersect the particle set.

1 Like

Hi Oli,

Do the original image stacks have UIDs tacked onto their filenames?

Assuming I’ve understood the use-case correctly…

If the conversion to star in the first instance was done taking care to remove the first set of UIDs from the image stack paths, I think an intersect based on path with Ignore leading UID enabled should work as you described? However, it may also require some finagling back to mrc from mrcs if this was required when manipulating the star file.

Sans removal, a reason intersect may not work is if Ignore leading UID removes only the second set of UID-prefixes added during Import (and the first set on Particle set A), while leaving the first set intact on the reimported comparison. This is my impression of its logic.

Cheers,
Yang

1 Like

Hi Yang,

Yes they do - and indeed, editing the star file so it refers to the mrc stacks rather than mrcs gets me part way there, thanks!

However, interesecting on path while ignoring UIDs is still not quite working as I would expect. I have set A, with ~141k particles, and set B, a subset of 52k particles (selected using CryoSieve). The intersection should contain the 52k particles from A that match those in B, as B is a subset of A.

Previously, when the filenames were mismatched, it gave an intersection of 0 (as expected - no overlap).

Now, with the mrc/mrcs confusion fixed, it is giving an intersection of 141k particles, which is impossible… thoughts?

Here is the log:

Operating on particle. Action is intersect. Slot that will be used during output is blob.

Inputted 141271 items as set A

Action is Intersect/Difference

Inputted 57982 items as set B

Intersection will use path field.

Computing set operations on blob/path..

Intersection: 141077 items

Using blob (and all passthrough fields) from set A for intersect output

A-minus-B: 194 items

Using blob (and all passthrough fields) from set A for A_minus_B output

B-minus-A: 0 items

Using blob (and all passthrough fields) from set B for B_minus_A output

Creating output files...

Done!

Done in 6.59s

--------------------------------------------------------------

Compiling job outputs...

Passing through outputs for output group intersect from input group particles_A

This job outputted results ['blob']

  Loaded output dset with 141077 items

Passthrough results ['ctf', 'location', 'alignments3D', 'pick_stats', 'ml_properties']

  Loaded passthrough dset with 141271 items

  Intersection of output and passthrough has 141077 items

Passing through outputs for output group A_minus_B from input group particles_A

This job outputted results ['blob']

  Loaded output dset with 194 items

Passthrough results ['ctf', 'location', 'alignments3D', 'pick_stats', 'ml_properties']

  Loaded passthrough dset with 141271 items

  Intersection of output and passthrough has 194 items

Passing through outputs for output group B_minus_A from input group particles_B

This job outputted results ['blob']

  Loaded output dset with 0 items

Passthrough results ['ctf', 'location', 'alignments3D']

  Loaded passthrough dset with 57982 items

  Intersection of output and passthrough has 0 items

Checking outputs for output group intersect

Checking outputs for output group A_minus_B

Checking outputs for output group B_minus_A

Updating job size...

Exporting job and creating csg files...

***************************************************************

Job complete. Total time 10.52s

Hi Oli,

Strange. I’ve just done a little test myself and noticed the same. It seems to be matching whole image stacks and ignoring the stack indices, which feels like a bug to me rather than the intended behaviour. Perhaps @wtempel can help shed some light?

Otherwise, I’m afraid cryosparc-tools, which I’m hopeless at, may be the only way to compare the two arrays.

EDIT:

Blindly cobbled together a cryosparc-tools workflow based on Example 9. The workflow assumes particles are reassociated with their exposures during import. This skirts the need for UID handling when comparing blob/path. Perhaps a starting point for what you want to do?

from cryosparc-tools import CryoSPARC
import numpy as np

cs = CryoSPARC(
    license="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    host="localhost",
    base_port=39000,
    email="ali@example.com",
    password="password123"
)

select_project = "P294"
select_workspace = "W2"
job_original_particles = "J31"
job_imported_particles = "J34"

original_particles = cs.find_job(select_project, job_original_particles).load_output("particles")
imported_particles = cs.find_job(select_project, job_imported_particles).load_output("imported_particles")

original_particles.add_fields(["intersect_field"], ["str"])
original_particles["intersect_field"] = [
    f"{r['location/micrograph_path']}.{r['blob/idx']}"
    for r in original_particles.rows()
]

imported_particles.add_fields(["intersect_field"], ["str"])
imported_particles["intersect_field"] = [
    f"{r['location/micrograph_path']}.{r['blob/idx']}"
    for r in imported_particles.rows()
]

intersection = original_particles.query({"intersect_field": imported_particles["intersect_field"]})

cs.save_external_result(
    select_project,
    select_workspace,
    intersection,
    type="particle",
    name="cryosieve_intersection",
    slots=["blob"],
    passthrough=(job_original_particles, "particles"),
    title="Cryosieved Subset",
)

Cheers,
Yang

3 Likes

@olibclarke @zhangrui_wustl: Our team agrees with @leetleyang’s finding that particle indices are (incorrectly) being ignored. We took a note of this issue and also find the approach in Yang’s script reasonable.

3 Likes

Hi all,

CryoSPARC v4.5 has been released which includes a fix for this bug; Particle Sets Tool now correctly uses both a particle’s file path and its index within the file to identify matches when intersecting by path. In addition, Particle and Exposure sets tools jobs now also include outputs for two versions of the intersection dataset; one where all output slots are copied from the A input, and the other where output slots are copied from the B input.

Best,
Michael

2 Likes

I’m attempting to intersect particle sets to recover CTF after import, but really struggling to get Import Particles job to keep original blob/path parameter. Any suggestions what I’m doing wrong?

I extract particles in J67 and export them using pyem to do some 3D classification in Relion (remembering to edit the star file to flip mrc to mrcs and create symlinks ending with mrcs). I finish classification, select a single class, and now want to refine it in cryosparc. I edit the star file back to flip mrcs to mrc and made sure they are still pointing to J67/extracted/. I then import the star file in J86 setting the particle data path to cryosparc project folder. The particles reimport successfully but now the blob/path points to J86/imported/ instead of the original J67/extracted/ from star file. Finally when I try running intersect job it fails with the error “no common input slot connected”.

@rpiwowarcz Please can you post

  1. the output of the following commands, after having replaced P99 with the actual CryoSPARC project ID
    csprojectid=P99
    cryosparcm cli "get_job('$csprojectid', 'J86', 'job_type', 'version',  'params_spec')"
    ls -l $(cryosparcm cli "get_project_dir_abs('$csprojectid')")/J86/imported/ | tail -n 4
    
  2. the top of the to-be-imported star file, including the header and first two data lines

Thanks for your help! I’m attaching the outputs. All the cryosparc files are saved under /data07/rpiwowarcz/csparc/PROJECT-TITLE and Relion processing under /data07/rpiwowarcz/PROJECT-TITLE

[klin_csparc@skgpu07 ~]$ csprojectid=P18
[klin_csparc@skgpu07 ~]$ cryosparcm cli "get_job('$csprojectid', 'J86', 'job_type', 'version',  'params_spec')"
{'_id': '67870c59d51891bc178488b3', 'job_type': 'import_particles', 'params_spec': {'alignments3D_exists': {'value': True}, 'blob_exists': {'value': True}, 'ctf_exists': {'value': True}, 'enable_validation': {'value': True}, 'location_exists': {'value': True}, 'particle_blob_path': {'value': '/data07/rpiwowarcz/csparc/CS-20241226-krios2-y518-b848-treat/'}, 'particle_meta_path': {'value': '/data07/rpiwowarcz/20241226-Krios2-Y518-B848-treat/Class3D/job014/run_it040_data_class003.star'}}, 'project_uid': 'P18', 'uid': 'J86', 'version': 'v4.6.0'}

[klin_csparc@skgpu07 ~]$ ls -l $(cryosparcm cli "get_project_dir_abs('$csprojectid')")/J86/imported/ | tail -n 4
lrwxrwxrwx 1 klin_csparc klin 145 Jan 14 22:08 018444340407407909891_rp2-3_282-56_011_X+1Y+1-11_patch_aligned_doseweighted_particles.mrc -> /data07/rpiwowarcz/csparc/CS-20241226-krios2-y518-b848-treat/J67/extract/rp2-3_282-56_011_X+1Y+1-11_patch_aligned_doseweighted_particles.mrc
lrwxrwxrwx 1 klin_csparc klin 145 Jan 14 22:08 018444619304891281571_rp2-3_163-42_010_X+1Y+1-10_patch_aligned_doseweighted_particles.mrc -> /data07/rpiwowarcz/csparc/CS-20241226-krios2-y518-b848-treat/J67/extract/rp2-3_163-42_010_X+1Y+1-10_patch_aligned_doseweighted_particles.mrc
lrwxrwxrwx 1 klin_csparc klin 145 Jan 14 22:08 018444921200662845718_rp2-3_179-17_033_X-1Y-1-11_patch_aligned_doseweighted_particles.mrc -> /data07/rpiwowarcz/csparc/CS-20241226-krios2-y518-b848-treat/J67/extract/rp2-3_179-17_033_X-1Y-1-11_patch_aligned_doseweighted_particles.mrc
lrwxrwxrwx 1 klin_csparc klin 144 Jan 14 22:08 018444984135198510753_rp2-3_143-60_020_X-1Y+1-9_patch_aligned_doseweighted_particles.mrc -> /data07/rpiwowarcz/csparc/CS-20241226-krios2-y518-b848-treat/J67/extract/rp2-3_143-60_020_X-1Y+1-9_patch_aligned_doseweighted_particles.mrc

[klin_csparc@skgpu07 ~]$ head -n 50 /data07/rpiwowarcz/20241226-Krios2-Y518-B848-treat/Class3D/job014/run_it040_data_class003.star

# version 30001

data_optics

loop_ 
_rlnVoltage #1 
_rlnImagePixelSize #2 
_rlnSphericalAberration #3 
_rlnAmplitudeContrast #4 
_rlnOpticsGroup #5 
_rlnImageSize #6 
_rlnImageDimensionality #7 
_rlnOpticsGroupName #8 
  300.000000     1.080000     0.000000     0.100000            1          384            2 opticsGroup1 
  300.000000     1.080000     0.000000     0.100000            2          384            2 opticsGroup2 
  300.000000     1.080000     0.000000     0.100000            3          384            2 opticsGroup3 
  300.000000     1.080000     0.000000     0.100000            4          384            2 opticsGroup4 
 

# version 30001

data_particles

loop_ 
_rlnImageName #1 
_rlnMicrographName #2 
_rlnCoordinateX #3 
_rlnCoordinateY #4 
_rlnAngleRot #5 
_rlnAngleTilt #6 
_rlnAnglePsi #7 
_rlnOriginXAngst #8 
_rlnOriginYAngst #9 
_rlnDefocusU #10 
_rlnDefocusV #11 
_rlnDefocusAngle #12 
_rlnPhaseShift #13 
_rlnCtfBfactor #14 
_rlnOpticsGroup #15 
_rlnRandomSubset #16 
_rlnClassNumber #17 
_rlnGroupNumber #18 
_rlnNormCorrection #19 
_rlnLogLikeliContribution #20 
_rlnMaxValueProbDistribution #21 
_rlnNrOfSignificantSamples #22 
000002@J67/extract/rp2-3_108-15_001_X-1Y-1-1_patch_aligned_doseweighted_particles.mrc csparc-mics/rp2-3_108-15_001_X-1Y-1-1_patch_aligned_doseweighted.mrc  4885.000000  3785.000000    -40.46490    86.717888    -79.20609     2.757375     1.731375 18454.978516 16399.255859    10.655390     0.000000     0.000000            1            1            3            1     0.627493 3.070349e+05     0.511870            4 
000005@J67/extract/rp2-3_108-15_001_X-1Y-1-1_patch_aligned_doseweighted_particles.mrc csparc-mics/rp2-3_108-15_001_X-1Y-1-1_patch_aligned_doseweighted.mrc  4602.000000  3013.000000    -22.90196    88.376793   -147.01588     -3.18382     5.707126 18553.867188 16498.144531    10.655390     0.000000     0.000000            1            2            3            1     0.628604 3.072666e+05     0.999988            1 

@wtempel following up the processing issues. After importing particles in J86, I used them to do homogenous refinement in J89. After which I wanted to re-extract particles but the jobs failed with the error:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 116, in cryosparc_master.cryosparc_compute.run.main
  File "/home/klin_csparc/Cryosparc3/cryosparc_worker/cryosparc_compute/jobs/extract/run.py", line 325, in run_extract_micrographs_multi
    run_extract_micrographs(job)
  File "/home/klin_csparc/Cryosparc3/cryosparc_worker/cryosparc_compute/jobs/extract/run.py", line 670, in run_extract_micrographs
    particles_dset = rc.load_input_group(input_group_name='particles')
  File "/home/klin_csparc/Cryosparc3/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 698, in load_input_group
    assert is_connected[idx], "Slot %s.%s must be connected in connection %d!" % (input_group_name, slot['name'], idx)
AssertionError: Slot particles.location must be connected in connection 0!

For the looks of it the J89 refinement striped the location data.

Fileds in: J86/imported_particles.cs

('uid',
 'blob/path',
 'blob/idx',
 'blob/shape',
 'blob/psize_A',
 'blob/sign',
 'blob/import_sig',
 'ctf/type',
 'ctf/exp_group_id',
 'ctf/accel_kv',
 'ctf/cs_mm',
 'ctf/amp_contrast',
 'ctf/df1_A',
 'ctf/df2_A',
 'ctf/df_angle_rad',
 'ctf/phase_shift_rad',
 'ctf/scale',
 'ctf/scale_const',
 'ctf/shift_A',
 'ctf/tilt_A',
 'ctf/trefoil_A',
 'ctf/tetra_A',
 'ctf/anisomag',
 'ctf/bfactor',
 'location/micrograph_uid',
 'location/exp_group_id',
 'location/micrograph_path',
 'location/micrograph_shape',
 'location/micrograph_psize_A',
 'location/center_x_frac',
 'location/center_y_frac',
 'alignments3D/split',
 'alignments3D/shift',
 'alignments3D/pose',
 'alignments3D/psize_A',
 'alignments3D/error',
 'alignments3D/error_min',
 'alignments3D/resid_pow',
 'alignments3D/slice_pow',
 'alignments3D/image_pow',
 'alignments3D/cross_cor',
 'alignments3D/alpha',
 'alignments3D/alpha_min',
 'alignments3D/weight',
 'alignments3D/pose_ess',
 'alignments3D/shift_ess',
 'alignments3D/class_posterior',
 'alignments3D/class',
 'alignments3D/class_ess')

Fileds in: J89/J89_passthrough_particles.cs

('uid',
 'blob/path',
 'blob/idx',
 'blob/shape',
 'blob/psize_A',
 'blob/sign',
 'blob/import_sig')

Hi @rpiwowarcz!

When you ran the Import Particles job (J86), did you attach your micrographs to the Source Exposures slot? Did the Import particles job have any warning or error messages in the event log?

Hi! Yeah I always connect the Source Exposures and there was one warning about particle blob path. Here’s the log from the job.

[CPU:   78.8 MB  Avail: 626.49 GB]
Importing job module for job type import_particles...
[CPU:  262.2 MB  Avail: 626.36 GB]
Job ready to run
[CPU:  262.2 MB  Avail: 626.36 GB]
***************************************************************
[CPU:  262.2 MB  Avail: 626.36 GB]
Importing particles from  /data07/rpiwowarcz/20241226-Krios2-Y518-B848-treat/Class3D/job014/run_it040_data_class003.star
[CPU:  262.4 MB  Avail: 626.36 GB]
File extension is  star
[CPU:  262.4 MB  Avail: 626.36 GB]
Importing star file.
[CPU:  520.6 MB  Avail: 626.10 GB]
--------------------------------------------------------------
[CPU:  520.6 MB  Avail: 626.10 GB]
Loaded star file with 303180 items
[CPU:  520.6 MB  Avail: 626.10 GB]
Fields loaded from star file:  ['rlnVoltage', 'rlnImagePixelSize', 'rlnSphericalAberration', 'rlnAmplitudeContrast', 'rlnOpticsGroup', 'rlnImageSize', 'rlnImageDimensionality', 'rlnOpticsGroupName', 'rlnImageName', 'rlnMicrographName', 'rlnCoordinateX', 'rlnCoordinateY', 'rlnAngleRot', 'rlnAngleTilt', 'rlnAnglePsi', 'rlnOriginXAngst', 'rlnOriginYAngst', 'rlnDefocusU', 'rlnDefocusV', 'rlnDefocusAngle', 'rlnPhaseShift', 'rlnCtfBfactor', 'rlnRandomSubset', 'rlnClassNumber', 'rlnGroupNumber', 'rlnNormCorrection', 'rlnLogLikeliContribution', 'rlnMaxValueProbDistribution', 'rlnNrOfSignificantSamples']
[CPU:  520.6 MB  Avail: 626.10 GB]
--------------------------------------------------------------
[CPU:  520.6 MB  Avail: 626.10 GB]
Reading particle data locations...
[CPU:  520.6 MB  Avail: 626.10 GB] Reading rlnImageName to get indices and paths..
[CPU:  622.9 MB  Avail: 626.00 GB] Warning: Parameter particle_blob_path was set, which overrides rlnImageName despite the latter being present in the input star file.
[CPU:  622.9 MB  Avail: 626.00 GB] Parameter particle_blob_path was set and is a directory, so will be used as the search base for finding referenced data paths in the star file.
[CPU:  622.9 MB  Avail: 626.00 GB]
Searching for linked data files...
[CPU:  623.0 MB  Avail: 626.02 GB]
--------------------------------------------------------------
[CPU:  623.0 MB  Avail: 626.02 GB]
Compiling CTF information...
[CPU:  623.0 MB  Avail: 626.02 GB]
--------------------------------------------------------------
[CPU:  623.0 MB  Avail: 626.02 GB]
Compiling particle location information...
[CPU:  623.0 MB  Avail: 626.02 GB] Attempting to find corresponding filenames in rlnMicrographName and connected input exposures..
[CPU:  626.2 MB  Avail: 626.01 GB] Example source exposure filename: 
[CPU:  626.2 MB  Avail: 626.01 GB]    rp2-3_108-15_001_X-1Y-1-1_patch_aligned_doseweighted.mrc
[CPU:  625.8 MB  Avail: 626.01 GB] Example query exposure filename: 
[CPU:  625.8 MB  Avail: 626.01 GB]    rp2-3_108-15_001_X-1Y-1-1_patch_aligned_doseweighted.mrc
[CPU:  652.4 MB  Avail: 625.99 GB]
--------------------------------------------------------------
[CPU:  652.4 MB  Avail: 625.99 GB]
Compiling particle pose information...
[CPU:  652.4 MB  Avail: 625.99 GB] Converting euler angles..
[CPU:  652.4 MB  Avail: 625.99 GB] Converting rlnOriginXAngst to pixels using parameter psize_A
[CPU:  706.6 MB  Avail: 625.92 GB]
--------------------------------------------------------------
[CPU:  706.6 MB  Avail: 625.92 GB]
Particle information has now been imported for 303180 particles, creating outputs...
[CPU:  707.7 MB  Avail: 625.92 GB]
Found references to 33878 unique data files
[CPU:  722.2 MB  Avail: 625.90 GB]
Import paths were unique at level -1
[CPU:  722.2 MB  Avail: 625.90 GB]
Example imported relative path:
 J86/imported/012764796875737557513_rp2-3_108-15_001_X-1Y-1-1_patch_aligned_doseweighted_particles.mrc
[CPU:  722.2 MB  Avail: 625.90 GB]
Reading MRC file header to check shape...
[CPU:  804.3 MB  Avail: 625.83 GB] Setting validator for blob to: True
[CPU:  804.3 MB  Avail: 625.83 GB] Setting validator for ctf to: True
[CPU:  804.3 MB  Avail: 625.83 GB] Setting validator for location to: True
[CPU:  804.3 MB  Avail: 625.83 GB] Setting validator for alignments3D to: True
[CPU:  804.3 MB  Avail: 625.83 GB] Setting validator for filament to: False
[CPU:  804.3 MB  Avail: 625.83 GB] Setting validator for pick_stats to: False
[CPU:  804.3 MB  Avail: 625.83 GB]
--------------------------------------------------------------
[CPU:  804.3 MB  Avail: 625.83 GB]
Loaded 303180 particles.
[CPU:  804.3 MB  Avail: 625.82 GB] Common fields: 
[CPU:  804.3 MB  Avail: 625.82 GB]         ctf/accel_kv :  {300.0}
[CPU:  804.3 MB  Avail: 625.81 GB]            ctf/cs_mm :  {0.0}
[CPU:  804.3 MB  Avail: 625.81 GB]     ctf/amp_contrast :  {0.1}
[CPU:  804.3 MB  Avail: 625.81 GB]     ctf/exp_group_id :  {1, 2, 3, 4}
[CPU:  804.3 MB  Avail: 625.80 GB]         blob/psize_A :  {1.08}
[CPU:  804.3 MB  Avail: 625.80 GB]           blob/shape :  [384 384]
[CPU:  804.3 MB  Avail: 625.80 GB]
--------------------------------------------------------------
[CPU:  804.3 MB  Avail: 625.80 GB]
Making plots...

I have a suspicion that it might be connected to Live processing as it’s not generating UIDs for particles. When I use particle set tools with polished particles, the job runs fine. Let me know what you think!

Hi @rpiwowarcz! Thanks for the log. I don’t see anything obviously wrong there…could you clarify what you mean when you say:

Do you observe that particles coming from CryoSPARC Live do not have UIDs? How do you detect this?

Apologies. I’m probably oversimplifying here by a lot, but when I export particles extracted during the live processing, the particle stack is named after the movie without adding the UID to the file name. On the other hand, when exporting particles after RBMC, they have UIDs in the file name and can be successfully used for matching.

I see, thanks for clarifying! So if I understand correctly, your workflow looks like this:

  1. Process particles in CryoSPARC
  2. Export them with RELION
  3. Perform classification or other jobs in RELION
  4. Import the .star file from RELION made during step (3) back into CryoSPARC

And the particles from step (4) are able to be refined etc. in CryoSPARC, but you’re unable to re-extract them because of a missing location field?

The workflow is correct but my main problem was that I couldn’t use the particle set tools to recover CTF-refinement information from the original (step 1) particle stack.
The job would return empty intersect even though the particles originated from cryosparc in the first place. The problem doesn’t exist when working with RBMC particles tho.

Hi @rpiwowarcz, thanks for confirming. I see what you meant about the UIDs earlier – you’re right that they are different when the particles are re-imported.

If you didn’t perform higher-order CTF estimations earlier in your workflow, the CTF information (including per-particle defocus) should remain the same after re-importing from RELION, so you shouldn’t need to re-connect the CTF from an earlier refinement.

If you did perform higher-order CTF estimations, the easiest way to recover these is probably using CryoSPARC Tools to set the UIDs of the re-imported particles to be the same as the original particles, then use the low-level interface to replace the CTF values of the resulting UID-corrected particles. You can find an example to reset the UIDs here.

You could also modify that script to replace the CTF values directly rather than use the UIDs, if you prefer.

I hope that’s helpful!