Hi everyone,
I am processing a big dataset (33k movies), collected on the Falcon 4i. While attempting to run Patch motion correction, I repeatedly encounter the same problem - job either fails, half-way, or completes, with a half of the micrographs.
I have tried restarting cryosparcm, updating to the latest version, but the problem persists.
The jobs are running on a cluster.
I would appreciate any advice on this matter.
The error message on the failed micrographs:
Error occurred while processing J53/imported/013886693208130898513_FoilHole_30446358_Data_29377140_29377142_20230430_104130_EER.eer
Traceback (most recent call last):
File â/projappl/project_2006450/usrappl/kyrybisi/cryoSPARC/cryosparc_worker/cryosparc_compute/jobs/pipeline.pyâ, line 60, in exec
return self.process(item)
File âcryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.pyâ, line 324, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 3463378 terminating self.
Marking J53/imported/013886693208130898513_FoilHole_30446358_Data_29377140_29377142_20230430_104130_EER.eer as incomplete and continuingâŚ
And some messages from the log:
================= CRYOSPARCW ======= 2023-05-25 03:01:24.188944 =========
Project P6 Job J73
Master puhti-login12.bullx Port 40402
===========================================================================
========= monitor process now starting main process at 2023-05-25 03:01:24.188997
MAINPROCESS PID 3463303
========= monitor process now waiting for main process
MAIN PID 3463303
motioncorrection.run_patch cryosparc_compute.jobs.jobregister
========= sending heartbeat at 2023-05-25 03:02:15.437499
Running job on hostname %s 2006450-gpu 2d
Allocated Resources : {âfixedâ: {âSSDâ: False}, âhostnameâ: â2006450-gpu 2dâ, âlaneâ: â2006450-gpu 2dâ, âlane_typeâ: âclusterâ, âlicenseâ: True, âlicenses_acquiredâ: 1, âslotsâ: {âCPUâ: [0, 1, 2, 3, 4, 5], âGPUâ: [0], âRAMâ: [0, 1]}, âtargetâ: {âcache_pathâ: â/run/nvme/job_15912870/dataâ, âcache_quota_mbâ: None, âcache_reserve_mbâ: 10000, âcustom_var_namesâ: [âcommandâ], âcustom_varsâ: {}, âdescâ: None, âhostnameâ: â2006450-gpu 2dâ, âlaneâ: â2006450-gpu 2dâ, ânameâ: â2006450-gpu 2dâ, âqdel_cmd_tplâ: âscancel {{ cluster_job_id }}â, âqinfo_cmd_tplâ: âsinfoâ, âqstat_cmd_tplâ: âsqueue -j {{ cluster_job_id }}â, âqstat_code_cmd_tplâ: None, âqsub_cmd_tplâ: âsbatch {{ script_path_abs }}â, âscript_tplâ: â#!/usr/bin/env bash\n \n#SBATCH --account=Project_2006450\n#SBATCH --job-name=cryosparc_{{ project_uid }}{{ job_uid }}\n#SBATCH --time=2-00:00:00\n#SBATCH -n {{ num_cpu }}\n#SBATCH --gres=gpu:v100:{{ num_gpu }},nvme:3600\n#SBATCH -p gpu\n#SBATCH --mem=0\n#SBATCH -o {{ job_dir_abs }}/cryosparc{{ project_uid }}{{ job_uid }}.out\n#SBATCH -e {{ job_dir_abs }}/cryosparc{{ project_uid }}_{{ job_uid }}.err\n \necho Local scratch directory path is: $LOCAL_SCRATCH\n \nexport CUDA_VISIBLE_DEVICES=0,1,2,3\nexport CRYOSPARC_SSD_PATH=$LOCAL_SCRATCH\n \n{{ run_cmd }}\nâ, âsend_cmd_tplâ: â{{ command }}â, âtitleâ: â2006450-gpu 2dâ, âtpl_varsâ: [âjob_dir_absâ, ânum_cpuâ, ârun_cmdâ, âjob_uidâ, âproject_uidâ, âcluster_job_idâ, ânum_gpuâ, âcommandâ], âtypeâ: âclusterâ, âworker_bin_pathâ: â/projappl/project_2006450/usrappl/kyrybisi/cryoSPARC/cryosparc_worker/bin/cryosparcwâ}}
ElectronCountedFramesDecompressor: reading using TIFF-EER mode.
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
========= sending heartbeat at 2023-05-25 12:05:05.792681
Unknown field with tag 65002 (0xfdea) encountered