Import particle stack from Relion to CryoSPARC

zhangrui_wustl · January 5, 2022, 8:13pm

Hi, I did 3D classification of a CryoSPARC created particle set in Relion3 and want to import the selected subset of particles back to CS for further processing. Although in the .star file the particle stack point to the original ones created by CS (in the JXXX folder), CS treats the imported particle stack as an independent one. There are two disadvantages: (1) In the subsequent processing, CS will make another copy of the original particle stack in /scratch (2) the imported stack loses some useful information such as the “scale” for each particle before entering Relion3.

I wonder if I can use the Particle Sets Tool (intersect) in CS to re-connect the imported particle stack to the original ones. This should be possible because they are pointing to the same .mrcs particle files.

wtempel · January 12, 2022, 9:55pm

@zhangrui_wustl Pre-existing particle UIDs can be transferred to re-imported particles by manipulation of pre-export and post-(re-)import .cs files in python if some identifying information has been preserved throughout export from cryoSPARC, processing outside cryoSPARC, re-importation to cryoSPARC.
Let A_particles.cs be the cryoSPARC metadata file that was earlier used for particle export from cryoSPARC and that includes UIDs you’d like to use going forward. Let B_particles.cs be the result of a recent re-importation of particles into cryoSPARC, which includes interesting additional attributes, but also a new, unwanted set of cryoSPARC UIDs. Let A_data and B_data be cryosparc_compute.dataset.Datasets derived from A_particles.cs and B_particles.cs, respectively.
In case each A_data and B_data include blob/path and blob/idx items and those items’ values haven’t changed, one can

convert A_data and B_data to A_df and B_df dataframes, respectively, using the cryosparc_compute.dataset.Dataset.to_dataframe() method
drop the unwanted uid column from B_df
keep just the ['uid', 'blob/path', 'blob/idx'] columns from A_df
“inner” merge A_df and B_df on ['blob/path', 'blob/idx']
create C_data = cryosparc_compute.dataset.Dataset().from_dataframe(merged_df)
write the new Dataset to disk: C_data.to_file("C_particles.cs")

Caveats:

The steps above are a motivational outline, not a tested sequence of commands that can be pasted verbatim into a script.
Preservation of blob/path and blob/idx is an optimistic assumption and does not apply to all export/processing/import workflows.

zhangrui_wustl · January 14, 2022, 5:22am

Hi, I really appreciate your reply, but this seems too complicated for me
I don’t think I have blob info for the re-imported particle set, assuming blob means the coordinates on the micrographs.
Is it possible for you guys to add an option in Particle Sets Tool (intersect), which only checks the particle files (.mrc or .mrcs), while keeping all the metadata from A_data?

Thanks!

wtempel · January 14, 2022, 2:54pm

@zhangrui_wustl To clarify, 'blob/path' is a file path, 'blob/idx' is an integer. Taken together, these values are an alternative (to the uid) identifier of a particle.
Even if 'blob/path', 'blob/idx' are no longer present explicitly in the data to be reimported, inspection of those data may reveal that a straightforward transform of 'blob/path', 'blob/idx', such as a concatenation, has been preserved.