Import ImageShift from .mdoc files (SerialEM)

Short script using cryosparc-tools to import ImageShifts from micrograph-associated .mdoc files. To use, fill out the variables at the top, and run this in the directory with the .mdoc files. It should make a new job in the specified workspace with an exposure output that you can then put into the Exposure Group Utilities job, and use to cluster via image shift.

Example job:
image

Exposure group utilities follow-up job:


(hopefully your shift groups look better - we were having hole centering issues on this dataset)

from pathlib import Path
import re
from cryosparc.tools import CryoSPARC
from getpass import getpass

##### Instructions #####
# Place a copy of this file in each directory you have .mdoc
# files in, load the cryosparc-tools conda environment (if using)
# and run this script: python3 cs_mdoc_image_shift.py

##### Variables #####
expected_groups = 25
cs_license = 'asdf1348-replace-me-123lkj12'
cs_hostname = 'mycryosparcinstance.mysupercoolinstitution.com'
cs_port = 39000
cs_email = 'myemail@mydomain.com'
cs_project = 'P83'
cs_workspace = 'W1'
cs_job = 'J56'
cs_job_output = 'exposure'

##### (il)Logic #####
#Set directory and get list of mdocs
mdoc_directory = Path(".")
mdoc_list = list(mdoc_directory.glob('**/*.mdoc'))

#Initialize list that will hold dictionary with exposure information
exposures = []

#Iterate through list of mdoc files
for i in range(len(mdoc_list)):

    #Open individual mdoc file
    with open(mdoc_list[i]) as mdoc:

        #Read each line in mdoc file
        for line in mdoc:

            #if the line matches, store the name of the file and also the image shift values into a list
            if re.search("ImageShift",line):
                shift_data = {
                    "exposure": str(mdoc_list[i]).rsplit(sep='.',maxsplit = 1)[0],
                    "x_shift": line.rsplit()[2],
                    "y_shift": line.rsplit()[3],
                    }
                exposures.append(shift_data)

#Get CS password
print("Please enter CryoSPARC password: ")
cs_password = getpass()

#Connect to CS
cs = CryoSPARC(license=cs_license, email=cs_email, password=cs_password, host=cs_hostname, base_port=cs_port)
assert cs.test_connection()

#Find dataset and copy it to a new dataset
project = cs.find_project(cs_project)
job = cs.find_job(cs_project, cs_job)
cs_exposures = job.load_output(cs_job_output)
grouped_exposures = cs_exposures.copy()

#For every row in the dataset, search entire mdoc data and see if it fits a known movie and image shift. There's probably a much faster way to do this.
for row in grouped_exposures.rows():
    for i in range(len(exposures)):
        if exposures[i]['exposure'] == row['movie_blob/path'].rsplit(sep='/')[-1]:
            row['mscope_params/beam_shift'][0] = exposures[i]['x_shift']
            row['mscope_params/beam_shift'][1] = exposures[i]['y_shift']
            row['mscope_params/beam_shift_known'] = 1
            break

#Make a new external job with the updated values, and passthrough the remaining, non-updated values
project.save_external_result(
    workspace_uid=cs_workspace,
    dataset=grouped_exposures,
    type="exposure",
    name="exposure",
    slots=["mscope_params"],
    passthrough=(cs_job,cs_job_output),
    title="ImageShift Updated Exposures"
    )
5 Likes

@ccgauvin94 Thanks for sharing your script!

@wtempel is it possible to incorporate this script to the future versions of CS as protocol?

Thank you!

hello @ccgauvin94 and @wtempel

Could you please clarify what “J” job has to be specified?

I tried the Import micrographs J job // Patch CTF J job but I am always having the following error -

File “/home/cryosparc_user/.local/lib/python3.11/site-packages/cryosparc/job.p y”, line 444, in load_output
raise TypeError(f"Job {self.project_uid}-{self.uid} does not have any result s for output {name}")
TypeError: Job P15-J28 does not have any results for output all_exposures

Thank you.

Kind regards,
Dmitry

The full command I am running looks like that

cryosparc_user@cryoem1:/data/dmitry//Movies$ python cs_mdoc_image_shift.py -e dmitry.XXXXX@XXXXXXX -p P15 -w W1 -j J28 -d /data/dmitry/Roman/PSI_lhca1_lhca4/set1_160424/240413_PS1_SPA/Movies

@ccgauvin94 have you been able to test if the CTF refinement is better using clustered shifts vs. shift pattern indices? Do the nominal shift groups differ from the index based groups at all?

I don’t normally save MDOC files per movie because it slows down data collection but interested to know!

1 Like

I’ve seen improvements in the 5/100ths of an Angstrom on some datasets - nothing super noteworthy.

Any job that outputs all_exposures

If you don’t have that available, just run a curate exposures job.

1 Like

hello @ccgauvin94 , I dont know - something does not work here -
I am receiving the following error -

(cryosparc-tools-env) cryosparc_user@cryoem1:/data/dmitry/XX/XXX/set1_160424/240413_PS1_SPA/Movies$ python cs_mdoc_image_shift_4.py -e dmitry.XXXXX@XXXX -p P15 -w W1 -j J27 -d .
Please enter CryoSPARC password:
Connection succeeded to CryoSPARC command_core at http://cryoem1.itqb.unl.pt:39002
Connection succeeded to CryoSPARC command_vis at http://cryoem1.itqb.unl.pt:39003
Connection succeeded to CryoSPARC command_rtp at http://cryoem1.itqb.unl.pt:39005
Traceback (most recent call last):
File “cs_mdoc_image_shift_4.py”, line 83, in
cs_project, cs_job, grouped_exposures = get_exposures_from_job(args.project, args.job)
File “cs_mdoc_image_shift_4.py”, line 58, in get_exposures_from_job
print(f"Available output types for job {project}-{job}: {cs_job.output_types}")
AttributeError: ‘Job’ object has no attribute ‘output_types’

I tried both scripts. (by the way - what is the difference between them?)

Any ideas?

Thank you.

Kind regards,
Dmitry

Hi Dmitry,

One script clusters locally, the other uses clustering in CryoSPARC. I prefer the latter.

print(f"Available output types for job {project}-{job}: {cs_job.output_types}")

AttributeError: ‘Job’ object has no attribute ‘output_types’
It looks like the job you selected has no outputs. Can you verify the job number has exposure outputs?

1 Like

Thank you @ccgauvin94,

  1. I have revised the script and used the last one.
  2. here is my command -

(cryosparc-tools-env) cryosparc_user@cryoem1:/data/dmitry/XXXX/XXXXXX/set1_160424/240413_XXXXX/Movies$ python u_cs_mdoc_image_shift.py -e dmitry.XXX@XXX.XXX.XXX -p P15 -w W1 -j J311 -d .

  1. here is the image of J311
    image

  2. The error is the following -

(cryosparc-tools-env) cryosparc_user@cryoem1:/data/dmitry/XXXXX/XXXX/set1_160424/240413_XXXX/Movies$ python u_cs_mdoc_image_shift.py -e dmitry.XXXXX@XXX.XXX.XX -p P15 -w W1 -j J311 -d .
Please enter CryoSPARC password:
Connection succeeded to CryoSPARC command_core at http://cryoem1.XXX.XXX.XXX:39002
Connection succeeded to CryoSPARC command_vis at http://cryoem1.XXX.XXX.XX:39003
Connection succeeded to CryoSPARC command_rtp at http://cryoem1.XXX.XXX.XX:39005
Traceback (most recent call last):
File “u_cs_mdoc_image_shift.py”, line 79, in
cs_project, cs_job, grouped_exposures = get_exposures_from_job(args.project,args.job)
File “u_cs_mdoc_image_shift.py”, line 61, in get_exposures_from_job
cs_exposures = cs_job.load_output(CS_JOB_OUTPUT_TYPE)
File “/home/cryosparc_user/.conda/envs/cryosparc-tools-env/lib/python3.7/site-packages/cryosparc/job.py”, line 444, in load_output
raise TypeError(f"Job {self.project_uid}-{self.uid} does not have any results for output {name}")
TypeError: Job P15-J311 does not have any results for output all_exposures