Job Error, Patch Motion Correction, EER format

Dear Support Team,

one of our users is currently experiencing an issue with a Patch Motion correction Job. Also, I am requesting a feature which you will find below.
The user took two independent datasets. Both have been successfully imported with equal settings.
For the first dataset, the patch motion correction (PMC) job worked perfectly. For the second dataset, PMC did not fail but did not process a single movie, rejecting each movie with the error code:

Error occurred while processing J37/imported/007282014867148127012_FoilHole_18191598_Data_18192004_18192006_20230718_152240_EER.eer
Traceback (most recent call last):
  File "/$PATHTOWORKER/cryosparc_worker/cryosparc_compute/jobs/pipeline.py", line 60, in exec
    return self.process(item)
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 117, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 132, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc_master/cryosparc_compute/blobio/prefetch.py", line 68, in cryosparc_compute.blobio.prefetch.synchronous_native_read
RuntimeError: Error ocurred (State not recoverable) at line 973 in eer_codec_create

See job log for further details.
EER I/O error, see job log

IO request details:
filename:    /scratch/tmp/$PATHTOFILE/J37/imported/007282014867148127012_FoilHole_18191598_Data_18192004_18192006_20230718_152240_EER.eer
filetype:    2
header_only: 0
idx_start:   0
idx_limit:   -1
eer_upsampfactor: 1
eer_numfractions: 40
num_threads: 6
buffer:      (nil)
nx, ny, nz:  0 0 0
dtype:       0
total_time:  -1.000000


Marking J37/imported/007282014867148127012_FoilHole_18191598_Data_18192004_18192006_20230718_152240_EER.eer as incomplete and continuing...

The job completes with 0 processed movies. For the second dataset, ~30,000 movies were recorded in .eer 8k format and then imported in 4k as usual to save space. Can the amount of movies be a problem since the first dataset containing ~10,000 movies went through? Please let me know if you need further information to solve the issue.

Feature request:
Is it possible that cryospar can give a feedback or make it transparent on how many raw .eer frames are grouped into each movie frame to make sure that import worked without data loss?

Thank you and best regards
Maximilian
Edit: The Job log file job.log did not contain further information.

Please can you open a separate topic for this interesting feature request.

Hi @mruetter,

The file path looks suspect, specifically, /scratch/tmp/$PATHTOFILE. Perhaps double check your file paths in import? Or perhaps you inserted $PATHTOFILE to obfuscate the true path, in which case disregard this suggestion :slight_smile:

In either event, are you sure there was no other information in the job (text) log? The fact that the IO error message says State not recoverable suggests to me that something went wrong inside the actual EER decoding library, and messages from that library go to the job log…

Harris

Hi Harris,
thank you for your response! The $PATHOFFILE was just a spaceholder for the actual path to hide vulnerable data.
The following further information was provided in the job.log.

ElectronCountedFramesDecompressor: reading using TIFF-EER mode.
/dev/shm/cryosparc_blobio_61630_61663: Not a TIFF or MDI file, bad magic number 21336 (0x5358)
C++ Exception: std::exception
Error ocurred (State not recoverable) at line 973 in eer_codec_create

I was checking the issue further in detail and the problem seem to come from the Import movies job. The Import job is creating the dummyfiles but does not create the symlink to the path of the actual file. Creating a simple symlink from the master did not work neither but from a worker node. Have you experienced a similar situation before?

The issue is still not fixed but I applied a workaround for the users:
Import jobs are now running on the worker nodes.

Best
Maximilian

Hi Maximilian,

Interesting, I’m glad you found a workaround. In your description regarding the import movies job, what do you mean by dummy files? Is there any error message in the import job logs that suggest something went wrong while trying to symlink?

If you’re unable to create symlinks into your data directories from your master node, there might be something wrong at the filesystem or OS level. I have not experienced an issue like that before. Could you confirm that your master and worker installations are on different computers?

Harris

Hi Harris,

sure, sorry I forgot to attach them yesterday. Please find below the error messages from the import job.log.

===========================================================================
========= monitor process now starting main process at 2023-08-22 10:45:22.003239
MAINPROCESS PID 1173628
========= monitor process now waiting for main process
MAIN PID 1173628
imports.run cryosparc_compute.jobs.jobregister
========= sending heartbeat at 2023-08-22 10:45:34.867818
========= sending heartbeat at 2023-08-22 10:45:44.893617
ElectronCountedFramesDecompressor: reading using TIFF-EER mode.
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered

Could you confirm that your master and worker installations are on different computers?

Yes they are on different machines. The shared disk space (shared BeeGFS file system to which all 40 workers have access) is mounted via cifs (with the option mfsymlinks set in the fstab file) to the master node. I am wondering why this issue appeared suddenly (without any updates on both, worker and master) ans is only limited to some new projects. Has this issue which I reported previously something to do with it?

Best,
Maximilan

Hi @mruetter,

I apologize for the delay!

First of all, going way back to your feature request re: explicitly outputting the number of eer frames in each movie frame, I’ve recorded that request. I can’t promise a delivery timeline but we agree this would be a useful addition.

The “Unknown field with tag 65002” messages are actually normal when reading EER data. We plan to suppress them in a future version but in the meantime they don’t indicate any problem. Ultimately, though, I think that some file in your dataset is corrupt. The message Not a TIFF or MDI file, bad magic number 21336 means that the patch motion job encountered a file that wasn’t actually in a valid EER format (EER is a modified TIFF format, as you may know).

Harris