Beam tilt refinement by image shift groups for datasets acquired with the Leginon-Appion suite

kookjookeem · February 28, 2023, 10:44pm

Hello,

I often find that grouping micrographs into image shift groups and refining per-group CTF params improve resolution, when there is a significant optical aberration present in the dataset. We have had cases that improved the resolution from 3.4 Å to 2.8 Å & from 3 Å to 2.5 Å.

NU-Refinement; Iterative optimizations for per-particle defocus and per-group CTF parameters were ON. The masked FSC looks comparable to the unmasked FSC:
Before

Same NU-Refinement job after grouping the particles by image shift groups:
After

I use K-means clustering in scikit-learn to group micrographs by similar image shift X & Ys. Then I edit the particles.star to run CTFRefine in RELION. The downside of this is that the refined CTF params do not carry over to cryoSPARC when you import back the particles, and I am stuck with RELION for further processing.

I wanted to do the same in cryoSPARC, and it involves:

Running a python script for K-Means clustering (kmeans_groups.py)
Adding class identifier numbers to the filenames of symlinked micrographs, then importing the micrographs with new names (add_class.sh)
Reassigning particles to the imported micrographs (without re-extracting particles)
Running Exposure Group Utilities to split particles by their location/micrograph_path

Building from the initial k-means script that Bill Rice at NYU kindly provided, I wrote python and bash shell scripts for steps 1 & 2 (GitHub - kookjookeem/kmeans-beamtilt), and the page describing the steps can be found here.

I hope you find these scripts useful! Please try and let me know if you have any questions.

Best,
Kookjoo

Zhengshan · March 3, 2023, 8:35pm

Hi Kook,

This works great! I tried it with one of my datasets and the resolution improved from 3.1 Å to 2.8 Å. Thanks for sharing the scripts.

There is a small error on your instruction page:
When removing the UIDs, I think you meant “${file:22}”.

For the add_class script, when I just run it as it is, it shows an “ambiguous redirect” error. I changed the csvfile=“” to csvfile=“km_groups_01.csv” for it to work.

Also, because the first line of the input csv file is “name,class”, when the add_class script runs, it will show the output “name does not exist in mics”. I made a small change to make it skip the first line and the output would be cleaner:
{
read
while…

} < $csvfile
Hope the feedback helps!

kookjookeem · March 3, 2023, 10:22pm

Hey Zhengshan,

Thanks for your feedback! Great to hear that it made some resolution improvement. I edited the GitHub wiki and add_class.sh per your suggestions.

Best,
Kookjoo

wrice · March 5, 2023, 6:46pm

Hi Kookjoo,

That’s great that my script helped. Although it is distributed with Leginon, I realized my Tiltgroup_wrangler script is in a bit of an obscure location. Here is the GitHub link for the program and instructions:

You just need to download the CTF information from the Leginon website, and load the cryoSPARC particle set and passthrough particles .cs files. It outputs a new .cs file, and you can then replace either the particle set or the passthrough file with this file. If you then re-refine, cryoSPARC will divide the set into the number of groups specified. I recently updated it to be compatible with cryoSPARC 4.
Hope you and other Leginon users find this useful.

Bill

olibclarke · March 5, 2023, 8:27pm

I had no idea this existed and it looks super helpful, thanks Bill - we used your kmeans clustering script (thanks!), but I didn’t realize that tiltgroup_wrangler existed until now! (I also didn’t realize until now that Appion was able to directly output beam tilt groups!)

Re the built-in clustering in Leginon, is that using kmeans as well? Often we find we need to do a bit of tweaking to the raw output of kmeans - sometimes two clusters will be merged in one, or one cluster split in two, and having graphical feedback on the cluster center locations is handy for this purpose to make sure everything looks good.

Cheers
Oli

wrice · March 6, 2023, 12:15pm

Hi Oli,
The website version uses the same kmeans algorithm so it will give similar results. There is no plot but you get the clustering directly. I usually target in such a way that there are no obvious clusters by eye, so I just choose a large number like 50-100.

Bill

kookjookeem · November 8, 2023, 10:50pm

Hi friends,

I wrote a Jupyter notebook document for this workflow. This utilizes cryosparc-tools to load particles and directly update their ctf/exp_group_id without having to import a new set of beamtilt-grouped movies/micrographs. It creates an external job, which outputs the particle stack with the new exposure group assignments.

You can view and download assign_kmeans_exp_groups.ipynb from the GitHub repo: https://github.com/kookjookeem/kmeans-beamtilt/blob/main/assign_kmeans_exp_groups.ipynb

Please try and let me know if there is any issue/question. Thanks!

Best,
Kookjoo

P.S. I have not tried yet, but I imagine the workflow could be implemented to work with new Import Beam Shift and updated Exposure Group Utilities jobs in v4.4 beautifully described in the guide. I’ll try some things…

mmclean · November 14, 2023, 9:45pm

Hi all,

As noted by @kookjookeem, AFIS beam shift import support has now been added to CryoSPARC v4.4. Currently, it supports sessions collected with EPU, as it requires an XML file for each movie that contains the applied beam shift information. The import code that pulls the beam shift values from the XML files is analogous to the EPU_Group_AFIS repository provided by Dustin Morado, which was described extremely well in his forum post on SciLifeLab. Support for reading in the beam shifts from leginon/appion hasn’t been added explicitly yet. But, Exposure Group Utilities now allows for clustering exposures based on the beam shift values that are in the exposure dataset. The specific field storing the beam shift is in the exposures’ mscope_params/beam_shift.

For non-EPU data collected sessions, CryoSPARC Tools could be used to read these beam shift values from any external files. Then, an existing exposure dataset could be loaded into tools and, for each movie, the mscope_params/beam_shift field could be set. One would also have to set the mscope_params/beam_shift_known field to the value 1 for all the exposures which have known beam shift values. Then, these exposures could be saved back to CryoSPARC, via the project.save_external_result method, and clustered via Exposure Group Utilities.

Michael

kookjookeem · November 16, 2023, 7:49pm

Dear all,

I was able to import the beam shift/image shift values for a dataset collected via Leginon/Appion and wanted to share here.

In line with what @mmclean explained in this topic and in another post, I generated XML for each micrograph in the dataset, imported the beam shift values in the XML files via Import Beam Shift, and ran the clustering & exp_group_id mapping procedures via Exposure Group Utilities (gave both the exposures and particles as input).

The image above shows that agglomerative clustering method can cluster 225 exposure groups for this particular dataset.

To generate XML files, I first prepared the data (micrograph name, image shift X & Y) obtained from Appion as a CSV file called mics_imageshift.csv. The CSV format follows:

micrograph1.mrc,-0.00000022151753017022197,-0.0000003004722777211951
micrograph2.mrc,-0.000006418716276264378,-0.0000032146108280230503
...

Next, we just need to write the XML files for each line in the CSV list.
Here’s a python script to generate XML reading data from CSV. lxml is required to run the script (pip install lxml):

#!/usr/bin/env python3
import csv
from lxml import etree
import sys
import argparse

def update_xml(xml_content, x, y):
    try:
        root = etree.fromstring(xml_content)
        for element in root.xpath('.//a:_x | .//a:_y', namespaces={'a': 'http://schemas.datacontract.org/2004/07/Fei.Types'}):
            element.text = str(x) if element.tag.endswith('_x') else str(y)

        updated_xml = etree.tostring(root, encoding='utf-8', xml_declaration=False).decode('utf-8')
        return updated_xml
    except Exception as e:
        print(f"Error updating XML: {e}")
        return None

def generate_xml_from_csv(input_csv):
    try:
        with open(input_csv, newline='') as csvfile:
            csv_reader = csv.reader(csvfile)
            for idx, row in enumerate(csv_reader, 1):
                filename, x, y = row
                xml_content = """
                <MicroscopeImage xmlns="http://schemas.datacontract.org/2004/07/Fei.SharedObjects">
                <microscopeData>
                <optics>
                <BeamShift xmlns:a="http://schemas.datacontract.org/2004/07/Fei.Types">
                <a:_x></a:_x>
                <a:_y></a:_y>
                </BeamShift>
                </optics>
                </microscopeData>
                </MicroscopeImage>
                """

                updated_xml = update_xml(xml_content, float(x), float(y))

                if updated_xml is not None:
                    with open(f'{filename[:-4]}.xml', 'w') as xmlfile:
                        xmlfile.write(updated_xml)
                else:
                    print(f"Skipped XML {idx}: {filename[:-4]}.xml")

        print("Done generating XML files!")
    except FileNotFoundError:
        print(f"Error: File '{input_csv}' not found.")
        sys.exit(1)
    except csv.Error as e:
        print(f"Error: reading CSV file '{input_csv}': {e}")
        sys.exit(1)
    except Exception as e:
        print(f"Unexpected error: {e}")
        sys.exit(1)

def main():
    parser = argparse.ArgumentParser(description="Generate XML files from a CSV file.")
    parser.add_argument("input_file", help="Path to the input CSV file")

    args = parser.parse_args()

    if args.input_file.lower().endswith('.csv'):
        generate_xml_from_csv(args.input_file)
    else:
        print("Error: Unsupported file format. Please provide a CSV file.")
        sys.exit(1)

if __name__ == "__main__":
    main()

I saved the script above as csv2xml.py and ran the script as:
./csv2xml.py mics_imageshift.csv
The script will write micrograph1.xml, micrograph2.xml, … in the same directory.

With the generated XML files, we can now import beam shift with movies/micrographs from the start of processing pipeline or import beam tilt for micrographs and particles that have been processed before v4.4.

Kookjoo

parrot · November 23, 2023, 1:10am

Hi Kookjoo,

This topic is really interesting to me. I’m relative new to cryoEM, and I have a question: can we do the same thing for the dataset collected via Latitude S?

Thank you

kookjookeem · November 23, 2023, 5:10am

Hi @parrot,

I apologize in advance as I am not familiar with Latitude S, but assuming your dataset was collected using the beam tilt/image shift strategy, I’d expect you’d be able to extract movie/micrograph filenames and the beam tilt x & y from .gtg files (are these similar to xml files from EPU?).

After extracting the beam tilt data to the csv format shown above, I imagine you’d be able to generate per-micrograph XMLs as shown in this workflow, using the csv2xml.py script.

Best,
Kookjoo

parrot · November 23, 2023, 8:33pm

Thank you for your advice, I will check those files to extract xy information.

wrice · November 28, 2023, 7:04pm

Hi Kookjoo,

That looks great. I just updated the code to the Leginon GitHub archive to allow direct download of image-shift XML files as described by Michael. Until it gets merged, the new branch comes off of the myami-beta branch, and is called download_imageshift_xml. It adds a link for download under the Image Shift section of the Summary page. When clicked, it creates an xml file for each micrograph, as described above, then downloads them all as a zip archive. You then just unzip this and import, either at the start of the processing or at a later point.