Utf-8 error in importing superresolution movies (*.mrcs)

Hi all,
I am testing version C3.3.1 in a 64-CPU/4GPU workstation with CUDA11.1 and RTX3090. I downloaded a test set of movies into it, and imported the movies to cryoSPARC.
The importing failed because of “UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xfc in position 0: invalid start byte”. I could not find related past postings, and wonder if some of you encountered similar errors. Is it due to the difference in bye order?
Thanks. Qiu-Xing

@jiangq9992003 Is this a publicly available movie set?
Please can you share the content of the import job’s Overview (from top to bottom) and Inputs and Parameters tabs.

Yes, it is a public dataset, which contains 32 movies from: EMPIAR-10025 T20S Proteasome at 2.8 Å Resolution.
Here is the complete content of overview:

License is valid.

Running job on master node

[CPU: 68.2 MB] Project P2 Job J2 Started

[CPU: 68.3 MB] Master running v3.3.1, worker running v3.3.1

[CPU: 68.3 MB] Working in directory: /raid1/cryospacworkruns/chgbmem/T20movies/P2/J2

[CPU: 68.3 MB] Running on lane default

[CPU: 68.3 MB] Resources allocated:

[CPU: 68.3 MB] Worker: dhcp-128-205-48-236

[CPU: 68.3 MB] --------------------------------------------------------------

[CPU: 68.3 MB] Importing job module for job type import_movies…

[CPU: 207.0 MB] Job ready to run

[CPU: 207.0 MB] ***************************************************************

[CPU: 207.0 MB] Importing movies from /raid1/cryospacworkruns/chgbmem/T20movies/14sep05c_raw_196/movies/*.mrcs

[CPU: 207.0 MB] Importing 32 files

[CPU: 207.1 MB] Import paths were unique at level -1

[CPU: 207.1 MB] Importing 34 files

[CPU: 207.1 MB] Reading header for each exposure…

[CPU: 207.1 MB] Spawning worker processes to read headers in parallel…

[CPU: 207.1 MB] Processed 32 headers…

[CPU: 207.4 MB] Processing results…

[CPU: 207.4 MB] Reading headers of gain reference file /raid1/cryospacworkruns/chgbmem/T20movies/14sep05c_raw_196/norm-amibox05-0.mrc

[CPU: 207.5 MB] Reading defect file /raid1/cryospacworkruns/chgbmem/T20movies/14sep05c_raw_196/dark-amibox05-0.mrc
[CPU: 207.5 MB] Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 85, in cryosparc_compute.run.main
File “/home/cryosparc_user/cryosparc3.3/cryosparc_master/cryosparc_compute/jobs/imports/run.py”, line 961, in run_import_movies_or_micrographs
lines = [line.strip() for line in defect_file]
File “/home/cryosparc_user/cryosparc3.3/cryosparc_master/cryosparc_compute/jobs/imports/run.py”, line 961, in
lines = [line.strip() for line in defect_file]
File “/home/cryosparc_user/cryosparc3.3/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/codecs.py”, line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xfc in position 0: invalid start byte

–Input/outputs tab:
Movies

SMovies data path

Absolute path, wildcard-expression (e.g. /mount/data/somewhere/*.mrcs) that will be imported. MRC (mrc, mrcs, stk) and TIFF format supported.

SGain reference path

Absolute path to a single gain reference for all the raw data, in MRC format. Leave blank if data is already gain-corrected.

SDefect file path

Absolute path to a defect file for all the raw data. This should be a .txt file. Leave blank if not applicable.

DFlip gain ref & defect file in X?

Flip gain ref and defect file left-to-right (in X axis)

DFlip gain ref & defect file in Y?

Flip gain ref and defect file top-to-bottom (in Y axis)

DRotate gain ref?

Rotate gain ref counter-clockwise by 90 degrees this many times

SRaw pixel size (A)

Pixel size of the raw movie data in Angstroms

SAccelerating Voltage (kV)

SSpherical Aberration (mm)

STotal exposure dose (e/A^2)

DNegative Stain Data

If Negative Stain Data is on, this indicates that there are light particles on dark background. If it’s off, this indicates the movies have dark particles on light background (cryo-em data).

DPhase Plate Data

DOverride Exposure Group ID

DSkip Header Check

Skip reading of every header file to increase import speed. WARNING: this assumes exposure shapes and extensions are consistent across the entire dataset.

DEER Number of Fractions

Number of fractions to make out of the EER input data.

DEER Upsampling Factor

Upsampling factor when decoding EER input data. Note that the pixel size you provide should be the raw pixel size at the nominal 4k sensor, not the pixel size after EER upsampling.

Compute settings

DNumber of CPUs to parallelize

Use this many CPUs to read headers in parallel

@jiangq9992003

It appears that an mrc-format file has been supplied as a “defect file”. What happens if you re-attempt that job, leaving the Defect file path field blank?

I will test it after the GPU is free tomorrow. Thanks.

@wtempel
I tested the path. Now it passed through that, but it can not reshape the movies (.mrcs).
Here is the error message.
[CPU: 641.8 MB] Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 85, in cryosparc_compute.run.main
File “/home/cryosparc_user/cryosparc3.3/cryosparc_master/cryosparc_compute/jobs/imports/run.py”, line 1049, in run_import_movies_or_micrographs
imgdata = mrc.read_mrc(abs_path)[1].sum(axis=0) * gainref
File “/home/cryosparc_user/cryosparc3.3/cryosparc_master/cryosparc_compute/blobio/mrc.py”, line 140, in read_mrc
data = read_mrc_data(file_obj, header, start_page, end_page, out)
File “/home/cryosparc_user/cryosparc3.3/cryosparc_master/cryosparc_compute/blobio/mrc.py”, line 100, in read_mrc_data
data = n.fromfile(file_obj, dtype=dtype, count= num_pages * ny * nx).reshape(num_pages, ny, nx)
ValueError: cannot reshape array of size 1023015936 into shape (38,7676,7420)

This matrix reshape issue did not show up with a large dataset of compressed tif files.

@jiangq9992003 Is it possible that your Movies data path specification (including wildcards?) mistakenly captures a file that is not a movie in (38,7676,7420) format, such as an mrc-format gain reference?

@wtempel It was not likely because the movies were names as *.mrcs, and the gain reference as *.mrc.

@jiangq9992003 Please can you post the output that precedes the error message.

Here is the output from reading 32 mrcs files. Thanks for looking into it.
[CPU: 207.3 MB] ===========================================================

[CPU: 207.3 MB] Loaded 32 movies.

[CPU: 207.3 MB] Common fields:

[CPU: 207.3 MB] mscope_params/accel_kv : {300.0}

[CPU: 207.3 MB] mscope_params/cs_mm : {2.7}

[CPU: 207.3 MB] mscope_params/total_dose_e_per_A2 : {52.0}

[CPU: 207.3 MB] mscope_params/exp_group_id : {2}

[CPU: 207.3 MB] mscope_params/phase_plate : {0}

[CPU: 207.3 MB] mscope_params/neg_stain : {0}

[CPU: 207.3 MB] movie_blob/psize_A : {0.66}

[CPU: 207.3 MB] movie_blob/shape : [ 38 7676 7420]

[CPU: 207.3 MB] movie_blob/is_gain_corrected : {0}

[CPU: 207.3 MB] ===========================================================

[CPU: 207.3 MB] Making example plots. Exposures will be displayed without defect correction.

[CPU: 207.3 MB] Reading file…

[CPU: 641.8 MB] Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 85, in cryosparc_compute.run.main
File “/home/cryosparc_user/cryosparc3.3/cryosparc_master/cryosparc_compute/jobs/imports/run.py”, line 1049, in run_import_movies_or_micrographs
imgdata = mrc.read_mrc(abs_path)[1].sum(axis=0) * gainref
File “/home/cryosparc_user/cryosparc3.3/cryosparc_master/cryosparc_compute/blobio/mrc.py”, line 140, in read_mrc
data = read_mrc_data(file_obj, header, start_page, end_page, out)
File “/home/cryosparc_user/cryosparc3.3/cryosparc_master/cryosparc_compute/blobio/mrc.py”, line 100, in read_mrc_data
data = n.fromfile(file_obj, dtype=dtype, count= num_pages * ny * nx).reshape(num_pages, ny, nx)
ValueError: cannot reshape array of size 1023015936 into shape (38,7676,7420)

If this problem has not yet been resolved, you may wish to

A re-run of the job with fully patched version 3.3.2 of cryoSPARC should indicate the path of the MRC file for which ValueError occured.