Reference based motion correction error: All movies must have the same number of frames

parrot · November 22, 2023, 7:10am

Hi All,

I try to run reference based motion correction but the job stopped at the beginning with error: All movies must have the same number of frames.
I run the job with default settings, the inputs are the volume from non-uniform refinement and the output from Patch CTF job. The dataset is a single dataset.

Thank you

Error:
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_reference_motion.py”, line 265, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_reference_motion.run_reference_motion_correction
AssertionError: All movies must have the same number of frames

CryoEM2 · November 22, 2023, 3:17pm

maybe run the job “check for corrupt particles”? or split the dataset with exposure tools and see if there is a subset that don’t work? be sure the import was correct, not importing low-mag images? redo patchCTF or try CTFFind just to see if it works. Sorry, these are all self-diagnosis, it’s not obvious to me from the information given why this would happen. Devs will ask for more information from the logs and inputs.

parrot · November 22, 2023, 6:30pm

I tested some other datasets and they showed the same error. It seems something is not compatible with our data.

martinpacesa · November 24, 2023, 7:56am

I am experiencing the same problems on three different microscopes with all our datests (tried 8 different ones). They all are EER datasets from Falcon IV.

matt1 · November 25, 2023, 7:54am

I also got the same error with micrographs collected on a Krios with the K3 detector (40 frames per movie). I used the exposures exported from a Live session.

parrot · November 25, 2023, 8:23am

@martinpacesa and @matt1
Thanks for sharing. My datasets were collected on Krios with K3 detector and 60 frames/movie. We use Latitude S for data collection, and our facility has an internal setup to convert raw files (.gtg) to .tif files.

mathewmclaren · November 25, 2023, 10:29pm

We had exactly the same issue with one of our datasets. It worked fine for other jobs, but would fail in this instance. We ended up checking the headers of each tiff file and found that a very small number were missing frames - we skipped the header check on import which I guess would have caught this. They were all from the same grid square and they were all 49 frames instead of the 54 that all the other images had. Once we excluded these, the polishing ran successfully.
If it helps, we made the below Bash script to loop through all the files and use Imod’s header command to print each file that contained the wrong number of tilts. Change the directory, extension and the frame number to your parameters. It’s pretty slow, but we never worked out a quicker approach.

#!/bin/bash

dir="GridSquare*/Data/"
frame_count=54
ext=tiff
for file in ${dir}/*.${ext}; do
    if [ -f "$file" ]; then
        echo $file
        frames=$(header "$file" |grep sections|awk '{print $NF}')
        echo $frames
        if [ $frames -ne $frame_count ]; then
            echo "File: $file - Frames: $frames" >> bad_frames_count.txt
        else
            continue
        fi
    fi
done

parrot · November 25, 2023, 10:49pm

@mathewmclaren
Thanks for your help! I’ve never thought there could be missing frames. I’ll definitely try to check my datasets.

olibclarke · November 25, 2023, 10:50pm

This looks very handy - to make faster you might want to try gnu parallel, it is useful as a rough and ready way to parallelize tasks where you need to loop through a bunch of files, e.g.

find ./ -maxdepth 1 -name "*mrc" -print | parallel -j 28 'mrc2tif -s -c zip {/} {/.}.tif' >& log &

(this is just an example code snippet, not related to the subject at hand, but could maybe be adapted)

mathewmclaren · November 26, 2023, 7:46pm

This is a great idea and I somehow didn’t think of it! It took a bit of effort, but I’ve managed to update it and add some basic help and user inputs so people don’t need to edit the script by hand.
I’ve uploaded the script here if anyone wants to use it.

parrot · November 30, 2023, 7:56pm

Hi All,

I found my problem can also be solved by running an import movie job without skip header check. This helps me to identify the bad movies. I manually removed those movies, then everything ran successfully.

matt1 · November 30, 2023, 11:33pm

@parrot and @mathewmclaren

Thank you for sharing this very useful script. I gave it a try and it looped through all my 9000+ movies. They all seem to have 40 fractions. I ran a movie import job without “skip_header_check”, and it does not find any bad movie.

martinpacesa · December 1, 2023, 10:55am

Could we get an option to simply ignore the micrographs with inconsistent frame numbers?

hsnyder · December 4, 2023, 4:09pm

@martinpacesa, thanks for the suggestion. I’ve recorded this request and we’ll consider implementing it or something functionally similar in the future.

mathewmclaren · December 7, 2023, 6:54pm

@matt1 Just a check, are all the movies that provide your particles included in your inputs? I have a job that keeps crashing so decided to split my movies into subsets to see if I can get at least some of my particles to extract. However, my job crashed with the frame error (and obviously the number of frames is correct). It might be worth doing a count of your files and see if it matches with the number of exposures

scottbcohen · December 29, 2023, 10:49pm

Any progress on this issue?

Thanks and Happy New Year!

scottbcohen · January 13, 2024, 4:21am

I have a map from non-uniform refinement to 2.6 Angstrom; the full data set (~380k ptcls) is an accumulation of several separate acquisition sessions over 18 months (mostly at tilt, and few ptcls per micrograph). As alluded to above, Reference-based Motion Correction returned the error “All micrographs must have the same number of frames”.

The job did proceed smoothly when applying a small “test” collection of particles from a SINGLE acquisition. I’m guessing the problem is maybe just a few rogue micrographs somewhere within the full data set?

Is there a way sort of work backwards, taking the full particle stack and separate the particles according to their acquisition sessions (or Import Jobs), such that maybe I can narrow down the culprit and/or motion-correct at least some of the particles?

Alternatively, as suggested above by @martinpacesa , can the Job be modified such that rogue frame counts are simply ignored? Would be a shame to not be able to apply Reference-based MotionCorr after such a long haul.

Thanks for your suggestions!
Scott

wtempel · January 16, 2024, 3:44pm

There may be, as described under Particle sets job - limit number of micrographs used - #5 by mmclean.
If you ran Patch CTF Estimation separately for each acquisition session’s data, you could connect

a single session’s Patch CTF Estimation exposures output
the combined particle set

to a Manually Curate Exposures job and, inside the job and without modifying any selection criteria, push the Done button. The Particles Accepted output should contain particles derived from that single session only.

scottbcohen · January 20, 2024, 1:30am

@wtempel

Thanks heaps - this is very handy and simple too.

Scott (CMRI down under)

kstachowski · January 23, 2024, 3:02pm

Hi @parrot and others,

Here is a small script that @rwaldo devised as a work-around to this issue that you may find helpful. You will need to run this in a terminal using cryosparc-tools.

Steps:

Reimport movies with Skip header check disabled if that was not initially done. This will output a set of ‘failed_movies’ whose frame count does not match the majority of the movies imported.
Next, run the attached script, replacing the appropriate instance information, project number, workspace number, and job numbers (job_1 = new import movies, job_2 = patch CTF estimation with all movies). This will create a new external job in the appropriate workspace with a set of exposures containing only the number of expected frames.
Create a Manually curate exposures job with the exposures output from the External Results job, and the particles stack you would like to perform RBMC on. Set the parameter Number of picked particles to `1,10000’. This will create an output of exposures where there is at least 1 particle present on the micrographs with the same number of frames.
Connect the Exposures and Particles outputs from the Manually curate exposures job to the RBMC job and the volume associated with those particles.

This should alleviate the error with an inconsistent # of frames.

Script:

from cryosparc.tools import CryoSPARC

cs = CryoSPARC(
    license="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    host="localhost",
    base_port=39000,
    email="ali@example.com",
    password="password123"
)

project_number = "P317"
workspace_number = "W15"
job_1_number = "J1179"
job_2_number = "J1079"
project = cs.find_project(project_number)

# Movies with bad number of frames
imported_movies_job = project.find_job(job_1_number)
failed_movies = imported_movies_job.load_output("failed_movies")

# All exposures from patch CTF estimation
patch_ctf_job = project.find_job(job_2_number)
patch_ctf_exposures = patch_ctf_job.load_output("exposures")

good_exposures = patch_ctf_exposures.query(lambda row: row['movie_blob/import_sig'] not in failed_movies['movie_blob/import_sig'])

cs.save_external_result(
    project_number,
    workspace_number,
    good_exposures,
    type="exposure",
    name="desired_number_frame_exposures",
)