Hi everyone,
I am processing a big dataset (33k movies), collected on the Falcon 4i. While attempting to run Patch motion correction, I repeatedly encounter the same problem - job either fails, half-way, or completes, with a half of the micrographs.
I have tried restarting cryosparcm, updating to the latest version, but the problem persists.
The jobs are running on a cluster.
I would appreciate any advice on this matter.
The error message on the failed micrographs:
Error occurred while processing J53/imported/013886693208130898513_FoilHole_30446358_Data_29377140_29377142_20230430_104130_EER.eer
Traceback (most recent call last):
File “/projappl/project_2006450/usrappl/kyrybisi/cryoSPARC/cryosparc_worker/cryosparc_compute/jobs/pipeline.py”, line 60, in exec
return self.process(item)
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 324, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 3463378 terminating self.
Marking J53/imported/013886693208130898513_FoilHole_30446358_Data_29377140_29377142_20230430_104130_EER.eer as incomplete and continuing…
And some messages from the log:
================= CRYOSPARCW ======= 2023-05-25 03:01:24.188944 =========
Project P6 Job J73
Master puhti-login12.bullx Port 40402
===========================================================================
========= monitor process now starting main process at 2023-05-25 03:01:24.188997
MAINPROCESS PID 3463303
========= monitor process now waiting for main process
MAIN PID 3463303
motioncorrection.run_patch cryosparc_compute.jobs.jobregister
========= sending heartbeat at 2023-05-25 03:02:15.437499
Running job on hostname %s 2006450-gpu 2d
Allocated Resources : {‘fixed’: {‘SSD’: False}, ‘hostname’: ‘2006450-gpu 2d’, ‘lane’: ‘2006450-gpu 2d’, ‘lane_type’: ‘cluster’, ‘license’: True, ‘licenses_acquired’: 1, ‘slots’: {‘CPU’: [0, 1, 2, 3, 4, 5], ‘GPU’: [0], ‘RAM’: [0, 1]}, ‘target’: {‘cache_path’: ‘/run/nvme/job_15912870/data’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: [‘command’], ‘custom_vars’: {}, ‘desc’: None, ‘hostname’: ‘2006450-gpu 2d’, ‘lane’: ‘2006450-gpu 2d’, ‘name’: ‘2006450-gpu 2d’, ‘qdel_cmd_tpl’: ‘scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: ‘sinfo’, ‘qstat_cmd_tpl’: ‘squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘sbatch {{ script_path_abs }}’, ‘script_tpl’: ‘#!/usr/bin/env bash\n \n#SBATCH --account=Project_2006450\n#SBATCH --job-name=cryosparc_{{ project_uid }}{{ job_uid }}\n#SBATCH --time=2-00:00:00\n#SBATCH -n {{ num_cpu }}\n#SBATCH --gres=gpu:v100:{{ num_gpu }},nvme:3600\n#SBATCH -p gpu\n#SBATCH --mem=0\n#SBATCH -o {{ job_dir_abs }}/cryosparc{{ project_uid }}{{ job_uid }}.out\n#SBATCH -e {{ job_dir_abs }}/cryosparc{{ project_uid }}_{{ job_uid }}.err\n \necho Local scratch directory path is: $LOCAL_SCRATCH\n \nexport CUDA_VISIBLE_DEVICES=0,1,2,3\nexport CRYOSPARC_SSD_PATH=$LOCAL_SCRATCH\n \n{{ run_cmd }}\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘2006450-gpu 2d’, ‘tpl_vars’: [‘job_dir_abs’, ‘num_cpu’, ‘run_cmd’, ‘job_uid’, ‘project_uid’, ‘cluster_job_id’, ‘num_gpu’, ‘command’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/projappl/project_2006450/usrappl/kyrybisi/cryoSPARC/cryosparc_worker/bin/cryosparcw’}}
ElectronCountedFramesDecompressor: reading using TIFF-EER mode.
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
========= sending heartbeat at 2023-05-25 12:05:05.792681
Unknown field with tag 65002 (0xfdea) encountered