Beam tilt refinement by image shift groups for datasets acquired with the Leginon-Appion suite

Hello,

I often find that grouping micrographs into image shift groups and refining per-group CTF params improve resolution, when there is a significant optical aberration present in the dataset. We have had cases that improved the resolution from 3.4 Å to 2.8 Å & from 3 Å to 2.5 Å.

NU-Refinement; Iterative optimizations for per-particle defocus and per-group CTF parameters were ON. The masked FSC looks comparable to the unmasked FSC:
Before

Same NU-Refinement job after grouping the particles by image shift groups:
After

I use K-means clustering in scikit-learn to group micrographs by similar image shift X & Ys. Then I edit the particles.star to run CTFRefine in RELION. The downside of this is that the refined CTF params do not carry over to cryoSPARC when you import back the particles, and I am stuck with RELION for further processing.


I wanted to do the same in cryoSPARC, and it involves:

  1. Running a python script for K-Means clustering (kmeans_groups.py)
  2. Adding class identifier numbers to the filenames of symlinked micrographs, then importing the micrographs with new names (add_class.sh)
  3. Reassigning particles to the imported micrographs (without re-extracting particles)
  4. Running Exposure Group Utilities to split particles by their location/micrograph_path

Building from the initial k-means script that Bill Rice at NYU kindly provided, I wrote python and bash shell scripts for steps 1 & 2 (GitHub - kookjookeem/kmeans-beamtilt), and the page describing the steps can be found here.

I hope you find these scripts useful! Please try and let me know if you have any questions.

Best,
Kookjoo

5 Likes

Hi Kook,

This works great! I tried it with one of my datasets and the resolution improved from 3.1 Å to 2.8 Å. Thanks for sharing the scripts.

There is a small error on your instruction page:
When removing the UIDs, I think you meant “${file:22}”.

For the add_class script, when I just run it as it is, it shows an “ambiguous redirect” error. I changed the csvfile=“” to csvfile=“km_groups_01.csv” for it to work.

Also, because the first line of the input csv file is “name,class”, when the add_class script runs, it will show the output “name does not exist in mics”. I made a small change to make it skip the first line and the output would be cleaner:
{
read
while…

} < $csvfile
Hope the feedback helps!

1 Like

Hey Zhengshan,

Thanks for your feedback! Great to hear that it made some resolution improvement. I edited the GitHub wiki and add_class.sh per your suggestions.

Best,
Kookjoo

Hi Kookjoo,

That’s great that my script helped. Although it is distributed with Leginon, I realized my Tiltgroup_wrangler script is in a bit of an obscure location. Here is the GitHub link for the program and instructions:

You just need to download the CTF information from the Leginon website, and load the cryoSPARC particle set and passthrough particles .cs files. It outputs a new .cs file, and you can then replace either the particle set or the passthrough file with this file. If you then re-refine, cryoSPARC will divide the set into the number of groups specified. I recently updated it to be compatible with cryoSPARC 4.
Hope you and other Leginon users find this useful.

Bill

3 Likes

I had no idea this existed and it looks super helpful, thanks Bill - we used your kmeans clustering script (thanks!), but I didn’t realize that tiltgroup_wrangler existed until now! (I also didn’t realize until now that Appion was able to directly output beam tilt groups!)

Re the built-in clustering in Leginon, is that using kmeans as well? Often we find we need to do a bit of tweaking to the raw output of kmeans - sometimes two clusters will be merged in one, or one cluster split in two, and having graphical feedback on the cluster center locations is handy for this purpose to make sure everything looks good.

Cheers
Oli

1 Like

Hi Oli,
The website version uses the same kmeans algorithm so it will give similar results. There is no plot but you get the clustering directly. I usually target in such a way that there are no obvious clusters by eye, so I just choose a large number like 50-100.

Bill

2 Likes