Re-extract particles after losing correspondence to original micrographs

stephan · October 7, 2020, 7:11pm

If your old workflow was from an instance running a recent version of cryoSPARC (v2.11+), you should be able to hit the “Export” button under the particles group in the “Output” tab of the Refinement job. Doing so will consolidate the particle stack into a single folder (explained here).

The reason you need to do this is because you’ll need to manually modify the particle stack to re-associate them with the movies you re-imported. When you extract particles from micrographs, the particle .cs file store the micrograph’s UID to remain associated with them. If you ever want to re-extract the particles, you can just connect the particle locations and the micrographs to the Extract from Micrographs job, and it will cut out the locations. Since you’ll be re-importing the movies and creating new micrographs (which have different UIDs), you’ll need to re-associate them to the particles. Luckily, the particles also store the path of the micrograph they came from, so we will use that to do so.

Once you’ve exported the particles, navigate to the “Overview” tab to find where they’ve been extracted to:

Keep note of this path.

You will also need to find the .cs file of the micrographs you have just completed CTF Estimation on. To do this, navigate to the “Outputs” tab of the Patch CTF job, and press the first button in the micrograph_blob result group to copy the path of the .cs file:

Using these paths, we will use Python to open and edit the file. I’ve already explained the preliminary steps to do this in the following article, so I will skip to the part where we actually edit the file.

You can also check out a similar topic where we need to manipulate .cs files:

Once you have the exported particle .cs file (particle_dset) and the micrograph_blob .cs file (micrograph_dset) opened in a python shell, do the following:

STALE- SEE NEW CODE BELOW

for index, particle in enumerate(particle_dset.get_items()):
    # get the micrograph path associated with the particle
    path_to_match = particle['location/micrograph_path']
    # get the corresponding micrograph using this path
    corresponding_micrograph = micrograph_dset.subset_query(lambda x: x['micrograph_blob/path'] == path_to_match)
    # store this micrograph's uid
    new_mic_uid = corresponding_micrograph.data['uid'][0]
    # assign the uid to the particle dataset
    particle_dset.data['micrograph_uid'][index] = new_mic_uid

To briefly cover the functions used:
get_items(): returns a “Spooling List” generator object which contains each item in the original dataset as a “Dataitem” (allowing you to access the row of each item in the original dataset as a dictionary)
subset_query: returns a subset of the original dataset by executing the match argument (in this case a lambda function) on each item in the original dataset.

Once this is done, you can save the particle dataset (particle_dset), and import it back into cryoSPARC using the Import Result Group job (also covered in the tutorial linked above). You can then connect the imported particle group and the new micrographs to an Extract from Micrographs job to re-extract the particles.