Patch CTF Estimation graph descriptions

mroy · November 8, 2023, 7:01pm

Hi,

I’m new to the cryoEM field and I’m learning how to use cryosparc. I’m wanting to understand the outputs of the patch CTF estimation. I’ve gleaned descriptions of 3 of the 5 outputs from various papers and cryosparc discussion boards, but I’m still stuck on two of them. The following one is difficult to find any description.

Does anyone know what the axis for each of these graphs are? The lines? That would help in narrowing down the search.

rwaldo · November 8, 2023, 8:04pm

Hi @mroy and welcome to cryoEM! I’d be happy to walk you through the outputs of a patch CTF job, and I’ve made a note to add a bit more explanation to our guide!

I’ll include an example image of my own here — it looks like you’re working with super-resolution data which scrunches the values up to fit in the higher resolutions. You may want to consider using Fourier cropping during motion correction to go back to physical pixel size, as it will speed up CTF and particle picking, but that’s up to you!

The plot in question is mostly used to ensure that background subtraction and envelope function fitting have succeeded — it is more diagnostic than informative.

Here, the X axis is the frequency in inverse Å. What we would typically call “higher resolution” signals are further to the right in this plot. The three things plotted in these graphs are:

In black, the radially-averaged power spectrum. The bright parts of the Thon rings would have a high value here, and the dark parts a low value.
In orange, the envelope function. Theoretically, Thon rings should continue all the way out to the Nyquist resolution, but because of various aberrations they fall off at higher frequencies. The envelope function is a way of modeling this falloff.
Finally, in blue, we plot the fitted CTF scaled by the envelope function. Here it is plotted oscillating between the envelope and 0, since this plot is mostly to ensure that background subtraction and envelope fitting have proceeded as expected.

The final CTF fit, which is more important to interpreting the results, are shown in this plot instead:

which plots the power spectrum (black) and the fitted CTF (red) as well as how well the agree with each other (blue).

I know that’s a lot of information! Please let me know if you have any more questions!

subhrob15 · March 8, 2024, 10:13am

Hello @rwaldo,
I had a question regarding this diagnostic plot. How and from where one can extract the values of this plot such that the same plot could be made in a graphing software such as graph pad.

Best regards,
Subhro

rwaldo · March 11, 2024, 7:30pm

Hi @subhrob15! If you’re asking about the plot with the power spectrum, CTF, and cross correlation you can indeed access that data. In the directory of the Patch CTF job, you’ll see a directory called ctfestimated. Within that directory are a number of .npy files (one for each micrograph), that contain the data transformed to produce these plots.

For instance, I have a Patch CTF job J49. I can get a list of all the relevant files like so:

import numpy as np
from pathlib import Path

ctf_job_path = Path("/bulk1/data/cryosparc_personal/rposert/CS-rposert-guide-work/J49")
all_ctf_files = (ctf_job_path / "ctfestimated").glob('*diag_plt.npy')

Then, if I write a function to plot them:

import matplotlib.pyplot as plt
def make_ctf_plot(data, filename):

    # note that raw data must be transformed
    power_spectrum = np.cbrt(data['EPA_trim'] - data['BGINT']) + 0.5
    ctf = np.cbrt(data['ENVINT'] * (2 * data['CTF']**2 - 1)) + 0.5
    cc = data['CC']

    # make plot
    plt.figure(figsize = [12, 6])
    plt.plot(data['freqs_trim'], power_spectrum)
    plt.plot(data['freqs_trim'], ctf)
    plt.plot(data['freqs_trim'], cc)
    plt.ylim([-0.1, 1.1])
    plt.xlim(0, data['freqs_trim'].max())
    plt.tight_layout()
    plt.savefig(filename)

and/or save to CSV for use in other software:

import pandas as pd
def write_csv(data, filename):
    df = pd.DataFrame({
        'freq': data['freqs_trim'],
        'ps': np.cbrt(data['EPA_trim'] - data['BGINT']) + 0.5,
        'ctf': np.cbrt(data['ENVINT'] * (2 * data['CTF']**2 - 1)) + 0.5,
        'cc': data['CC']
    })
    df.to_csv(filename, index = False)

I can write out the data for each micrograph like so:

for ctf_file in all_ctf_files:
    data = np.load(ctf_file)

    plot_name = ctf_file.name.replace('.npy', '.png')
    make_ctf_plot(data, plot_name)

    csv_name = ctf_file.name.replace('.npy', '.csv')
    write_csv(data, ctf_file.name.replace('.np', '.csv'))

Note that this script would create a plot and CSV for every single one of your micrographs — you probably do not want to generate that many files!

If you were referring to the other plot (with Power Spectrum, Parametric |CTF|^2, and Envelope), that data is not available outside the job.

I hope that helps!