Importing Optic Groups from EPU AFIS

PabloGallego · September 7, 2021, 10:32am

Hi guys,

For importing the Optics groups when the collection has been done with EPU AFIS strategy (Creating Optics Groups from EPU AFIS data and more - Cryo-EM - SciLifeLab Forum) first one should run the EPU_GROUP_AFIS tool (GitHub - DustinMorado/EPU_group_AFIS: Makes RELION 3.1 Optics Groups from EPU AFIS data)
The output is a .star file for using Relion/Scipion. Here I show how do I do to import the correct exposure group info in to Cryosparc.

Step 1 RUN EPU_AFIS

$ python EPU_Group_AFIS.py --xml_dir '<PATH_TO_XML>' --apix 0.86  --ftype tiff --movie_dir '<PATH_TO_movies>' --output_fn '<PATH_TO_outputfolder>/movies.star'
#--algorithm hac

Step 2 curate output

I enter the output file and eleminate the header then sort data for better comprehension runing in the terminal

$ sort -k2,2 movies.star > movies.good.sorted

Then I enter with NEDIT the sorted file. IS important to use Nedit as it has the posibility of select in columns (Ctrl+SHIFT)

nedit '<PATH_TO_outputfolder>/movies.good.sorted'

Then I change the text of EACH line to have the following example text you can do it selecting the text in colmuns (ctrl+shift+mouse click), then REPLACING the text with the replace tool to do all lines at the same time.

Example of one line:

micrographs.data[‘ctf/exp_group_id’] [micrographs.data[‘movie_blob/path’] == ‘J1/imported/FoilHole_4113390_Data_3092199_3092201_20201204_100228_fractions.tiff’] = 20
#In BOLD the formula text that you have to include in between the micrographs paths and the optic group
#In ITALICS the adres of the micrographs that you found using micrographs.data[‘movie_blob/path’] and the actual optic group.

Save the file with the name: ctf.py

Duplicate the ctf.py file and in the ducplicate change the marked field for all the lines:

Example of one line:

micrographs.data[‘mscope_params/exp_group_id’] [micrographs.data[‘movie_blob/path’] == ‘J1/imported/FoilHole_4113390_Data_3092199_3092201_20201204_100228_fractions.tiff’] = 20
#In BOLD the substitution

Save the file name: mscope_params.py

Step 3 run Pymol to change the optics group using the 2 files form step 2.

In a terminal open the file ‘*_passthrough_exposures_accepted.cs’ of the accepted exposures after the exposure curation job in PYTHON using the following:

$ cryosparcm icli

Now we are in a Python terminal then I write teh following

import numpy as n
import sys
from cryosparc_compute import dataset
dataset_path = '<PATH_TO_JOB>/*_passthrough_exposures_accepted.cs'
micrographs = dataset.Dataset().from_file(dataset_path)

Then I check everything is correctly loaded.

micrographs.data['movie_blob/path']

I run then the two files using

exec(open('<PATH_TO_outputfolder>/ctf.py').read())
exec(open('<PATH_TO_outputfolder>/mscope_params.py').read())

Then I check that they groups are changed in the Python terminal

n.set_printoptions(threshold=sys.maxsize)
micrographs.data['ctf/exp_group_id']

I shouold see a long matrix with the numbers of the neww optic groups

micrographs.data['mscope_params/exp_group_id']

I shouold see a long matrix with the numbers of the neww optic groups

to save the changes in the ‘*_passthrough_exposures_accepted.cs’ type in the python terminal:

micrographs.to_file(dataset_path) #particles.to_file(dataset_path)

I close the Python terminal

In cryoparc I use the job Import result group
Loading the .csg file that is related to the ‘<PATH_TO_JOB>/*_passthrough_exposures_accepted.cs’ that we modified using python.

This: ‘<PATH_TO_JOB>/*_passthrough_exposures_accepted.csg’

For checking In cryosparc I use the job exposure Group Utilities
input: result from importing
action: info_only

Work with this exposures.

apunjani · November 23, 2021, 4:54pm

Hi @PabloGallego Thank you for this useful post!

YYang · February 21, 2023, 7:15am

Hi @PabloGallego Thanks for this detailed instruction for importing EPU AFIS optic groups to cryoSPARC. However, in the current version of cryoSPARC, the imported movies or micrographs has a 21-digit UID added at the beginning of each file name. This leads to a mismatch of the movie_blob/path (or micrograph_blob/path if motion corrected micrographs, instead of raw movies, were imported to cryoSPARC) between the *_exposures_accepted.cs file and the ctf.py and mscope_params.py files made in Step 2. I am wondering if there is any workaround for this issue? Thanks a lot.

nfrasser · February 21, 2023, 11:09pm

Hi @YYang, thanks for you post. I don’t have the exact steps of this procedure required for the latest version of CryoSPARC, but I can provide the following advice:

Try modifying the relevant steps with awk (step 2 perhaps?) to strip out CryoSPARC UIDs:
```
awk '{print substr($0, 22)}' movies.star | sort -k2,2  > movies.good.sorted
```
Modify the Python script to strip out the uids from the filenames when comparing
Check out the cryosparc-tools example for a single script that imports from an XML file

Let me know if you run in into any trouble with the above.

YYang · February 22, 2023, 10:47am

Hi @nfrasser Thanks for the suggestions. The movies.star file generated by the EPU_GROUP_AFIS tool does not have UID for each movie. So awk is probably not needed for the sort -k2,2 step. But I am not quite clear how I should modify the Python script to import the opticGroups information from the ctf.py and mscope_params.py, which won’t have UID for each movie filename, to *_passthrough_exposures_accepted.cs, which contains different UID for each movie filename in the 'movie_blob/path' column.
I will for sure check out the cryosparc-tools for importing from an XML file.

nfrasser · February 22, 2023, 2:44pm

@YYang I believe you’ll have to subtly change the comparison in each line of ctf.py and mscope_params.py. For example, instead of

micrographs['ctf/exp_group_id'][micrographs.data['movie_blob/path'] == 'J1/imported/filename.tiff'] = 20

You could can instead find an item in ‘movie_blob/path’ with the same base name as in the star file

micrographs['ctf/exp_group_id'][index_of_substr(micrographs.data['movie_blob/path'], base_name('J1/imported/filename.tiff'))] = 20

Define the index_of_substr and base_name functions in Step 3, just before the exec calls:

def index_of_substr(arr, substr):
    arr = n.array(arr, dtype="U")
    return n.where(n.char.find(arr, substr) != -1)[0]

def base_name(file):
    return file.split('/')[-1].split('.')[0]

You may have to tweak these depending on your file name format.

YYang · February 23, 2023, 8:47am

Hi @nfrasser Great. Thanks a lot for the advice. I will test it.

mmclean · July 6, 2023, 2:24pm

Hi @here,

We are currently working on direct CryoSPARC integration of exposure groups clustering from data collected in EPU via AFIS. To this end, we are looking for users who would be willing to provide data from a collection session in EPU (via AFIS), including both the XML files and raw movies. To anyone willing to provide such data to aid our development, please reach out to me through DM, or by email at mmclean@structura.bio.

For development purposes, we’re seeking datasets where exposure group clustering (through EPU_Group_AFIS or otherwise) shows a notable improvement in resolution after repeating Global CTF Refinement.

Thanks all!

Michael

olibclarke · July 6, 2023, 2:50pm

Hi Michael,

Our bovine Tg dataset was one such case, where image-shift groups gave a substantial improvement in resolution.

It is available on EMPIAR (EMPIAR-10833 The structure of natively iodinated bovine thyroglobulin), and we could provide the image shift groups if needed (the data was collected with leginon).

Cheers
Oli

rbs_sci · July 7, 2023, 7:12am

Hi Michael,

E-mail sent.