Patch CTF: DatasetLoadError

The job is almost finished and the error occurs at the last minute. Also tried ctffind4 with only 5 files and got the same error. Running 4.6.0.

[CPU: 357.5 MB]
Checking outputs for output group exposures

[CPU: 357.5 MB]
Checking outputs for output group exposures_incomplete

Traceback (most recent call last):
File “/software/cryosparc_worker/cryosparc_tools/cryosparc/dataset.py”, line 615, in load
return cls._load_numpy(file, prefixes=prefixes, fields=fields, cstrs=cstrs)
File “/software/cryosparc_worker/cryosparc_tools/cryosparc/dataset.py”, line 645, in _load_numpy
indata = n.load(f, mmap_mode=mmap_mode, allow_pickle=False)
File “/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numpy/lib/npyio.py”, line 428, in load
return format.open_memmap(file, mode=mmap_mode)
File “/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numpy/lib/format.py”, line 886, in open_memmap
marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
File “/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numpy/core/memmap.py”, line 267, in new
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
ValueError: mmap offset is greater than file size

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 121, in cryosparc_master.cryosparc_compute.run.main
File “/software/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 1238, in check_outputs
output_dsets = [load_output_group_direct(_project_uid, _job_uid, output_group_name, [resname], [resname], memoize=True) for resname in outputted_result_names]
File “/software/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 1238, in
output_dsets = [load_output_group_direct(_project_uid, _job_uid, output_group_name, [resname], [resname], memoize=True) for resname in outputted_result_names]
File “/software/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 615, in load_output_group_direct
d = load_output_result_dset(project_uid, output_result, version, slot_name, memoize=memoize)
File “/software/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 586, in load_output_result_dset
d = load_dataset_cached(abspath).copy()
File “/software/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 572, in load_dataset_cached
return dataset.Dataset.load(path)
File “/software/cryosparc_worker/cryosparc_tools/cryosparc/dataset.py”, line 617, in load
raise DatasetLoadError(f"Could not load dataset from file {file}") from err
cryosparc_tools.cryosparc.errors.DatasetLoadError: Could not load dataset from file /home/mlzhao/group/jlodwick/process/P21/J61/J61_passthrough_exposures_incomplete.cs

Tried importing the motion corrected micrographs in a new job, and patch CTF ran without any problem.

Welcome to the forum @mingleizhao .
Please can you

  1. Post the output of the commands
    ls -lh /home/mlzhao/group/jlodwick/process/P21/J61/J61_passthrough_exposures_incomplete.cs
    file /home/mlzhao/group/jlodwick/process/P21/J61/J61_passthrough_exposures_incomplete.cs
    
  2. Describe the steps, programs and CryoSPARC job types involved in

Thank you for the reply.

Here is the output of the commands:

(base) [mlzhao@beagle3-login3 data]$ ls -lh /home/mlzhao/group/jlodwick/process/P21/J61/J61_passthrough_exposures_incomplete.cs

-rw-rw-r-- 1 mlzhao pi-mlzhao 4.0K Jul 23 22:16 /home/mlzhao/group/jlodwick/process/P21/J61/J61_passthrough_exposures_incomplete.cs
(base) [mlzhao@beagle3-login3 data]$ file /home/mlzhao/group/jlodwick/process/P21/J61/J61_passthrough_exposures_incomplete.cs
/home/mlzhao/group/jlodwick/process/P21/J61/J61_passthrough_exposures_incomplete.cs: data

This is an old project we want to reprocess. The original raw files were archived and the live session were compacted. We reupload the raw files and performed patch motion correction using the output from previous live exposure export. We did not try restoring the live session. The motion correction was successful, however, the following CTF was not.

I then tried import micrographs from the patch motion correction job folder, and I can run CTF with these imported micrographs without the error.

Thanks @mingleizhao .

Please can you check if the error still occurs after updating CryoSPARC to the latest version, currently v4.7.1?

Thank you. I will update here later. In this case Cryosparc is installed on a GPU cluster and it will take some time to process the update request.

Hello, the job completed successfully in the newest CryoSPARC version v4.7.1.

Just curious, could you briefly explain what might cause the problem?
Thank you.

@mingleizhao A bug that was fixed in v4.7 may have led to the DatasetLoadError.