Abnormal termination in import movies

We have run into a problem in importing movies from our Tundra. We are trying to import the *.eer formated files and the job crashes with “Job process terminated abnormally.” error. We can import the movies using CryoSparc Live. We are using Cryosparc v4.6.0 under Rocky Linux 8.

Len Thomas

@lmthomas Please can you post the output of these commands:

csprojectid=P99 # replace with actual project ID
csjobid=J199 # replace with actual job id
cryosparcm eventlog $csprojectid $csjobid | head -n 40
cryosparcm eventlog $csprojectid $csjobid | tail -n 40
cryosparcm joblog $csprojectid $csjobid | tail -n 40
cryosparcm cli "get_job('$csprojectid', '$csjobid', 'job_type', 'version', 'instance_information', 'params_spec')"
free -h
[spuser@spgpu CS-tfs-workflow-validataion]$ csprojectid=P13
[spuser@spgpu CS-tfs-workflow-validataion]$ csjobid=J1
[spuser@spgpu CS-tfs-workflow-validataion]$ cryosparcm eventlog $csprojectid $csjobid | head -n 40
[Wed, 02 Oct 2024 15:58:11 GMT]  License is valid.
[Wed, 02 Oct 2024 15:58:12 GMT]  Launching job on lane default target spgpu ...
[Wed, 02 Oct 2024 15:58:12 GMT]  Running job on master node hostname spgpu
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 91 MB] Job J1 Started
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 91 MB] Master running v4.6.0, worker running v4.6.0
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB] Working in directory: /data/UserData/spuser/DataValidation/CS-tfs-workflow-validataion/J1
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB] Running on lane default
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB] Resources allocated:
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB]   Worker:  spgpu
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB]   CPU   :  [0]
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB]   GPU   :  []
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB]   RAM   :  [0, 1, 2]
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB]   SSD   :  False
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB] --------------------------------------------------------------
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB] Importing job module for job type import_movies...
[Wed, 02 Oct 2024 15:58:21 GMT] [CPU RAM used: 318 MB] Job ready to run
[Wed, 02 Oct 2024 15:58:21 GMT] [CPU RAM used: 318 MB] ***************************************************************
[Wed, 02 Oct 2024 15:58:21 GMT] [CPU RAM used: 318 MB] Importing movies from /data/UserData/spuser/DataValidation/Images-Disc1/GridSquare_*/Data/*eer
[Wed, 02 Oct 2024 15:58:22 GMT] [CPU RAM used: 318 MB] Importing 1779 files
[Wed, 02 Oct 2024 15:58:23 GMT] [CPU RAM used: 319 MB] Import paths were unique at level -1
[Wed, 02 Oct 2024 15:58:23 GMT] [CPU RAM used: 319 MB] Importing 1780 files
[Wed, 02 Oct 2024 15:58:23 GMT] [CPU RAM used: 319 MB] 'Skip Header Check' parameter enabled, checking first header only
[Wed, 02 Oct 2024 15:58:24 GMT] [CPU RAM used: 88 MB] ====== Job process terminated abnormally.
[spuser@spgpu CS-tfs-workflow-validataion]$ cryosparcm eventlog $csprojectid $csjobid | tail -n 40
[Wed, 02 Oct 2024 15:58:11 GMT]  License is valid.
[Wed, 02 Oct 2024 15:58:12 GMT]  Launching job on lane default target spgpu ...
[Wed, 02 Oct 2024 15:58:12 GMT]  Running job on master node hostname spgpu
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 91 MB] Job J1 Started
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 91 MB] Master running v4.6.0, worker running v4.6.0
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB] Working in directory: /data/UserData/spuser/DataValidation/CS-tfs-workflow-validataion/J1
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB] Running on lane default
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB] Resources allocated:
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB]   Worker:  spgpu
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB]   CPU   :  [0]
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB]   GPU   :  []
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB]   RAM   :  [0, 1, 2]
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB]   SSD   :  False
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB] --------------------------------------------------------------
[Wed, 02 Oct 2024 15:58:14 GMT] [CPU RAM used: 92 MB] Importing job module for job type import_movies...
[Wed, 02 Oct 2024 15:58:21 GMT] [CPU RAM used: 318 MB] Job ready to run
[Wed, 02 Oct 2024 15:58:21 GMT] [CPU RAM used: 318 MB] ***************************************************************
[Wed, 02 Oct 2024 15:58:21 GMT] [CPU RAM used: 318 MB] Importing movies from /data/UserData/spuser/DataValidation/Images-Disc1/GridSquare_*/Data/*eer
[Wed, 02 Oct 2024 15:58:22 GMT] [CPU RAM used: 318 MB] Importing 1779 files
[Wed, 02 Oct 2024 15:58:23 GMT] [CPU RAM used: 319 MB] Import paths were unique at level -1
[Wed, 02 Oct 2024 15:58:23 GMT] [CPU RAM used: 319 MB] Importing 1780 files
[Wed, 02 Oct 2024 15:58:23 GMT] [CPU RAM used: 319 MB] 'Skip Header Check' parameter enabled, checking first header only
[Wed, 02 Oct 2024 15:58:24 GMT] [CPU RAM used: 88 MB] ====== Job process terminated abnormally.
[spuser@spgpu CS-tfs-workflow-validataion]$ cryosparcm joblog $csprojectid $csjobib | tail -n 40
*** (http://spgpu:39002, code 400) Encountered ServerError from JSONRPC function "get_job_log_path_abs" with params ('P13', ''):
ServerError: P13  does not exist.
Traceback (most recent call last):
  File "/spshared/apps/cryosparc4/cryosparc_master/cryosparc_command/commandcommon.py", line 196, in wrapper
    res = func(*args, **kwargs)
  File "/spshared/apps/cryosparc4/cryosparc_master/cryosparc_command/command_core/__init__.py", line 8181, in get_job_log_path_abs
    job_dir_abs = get_job_dir_abs(project_uid, job_uid)
  File "/spshared/apps/cryosparc4/cryosparc_master/cryosparc_command/commandcommon.py", line 187, in wrapper
    return func(*args, **kwargs)
  File "/spshared/apps/cryosparc4/cryosparc_master/cryosparc_command/command_core/__init__.py", line 8165, in get_job_dir_abs
    job_doc = get_job(project_uid, job_uid, 'job_dir')
  File "/spshared/apps/cryosparc4/cryosparc_master/cryosparc_command/commandcommon.py", line 187, in wrapper
    return func(*args, **kwargs)
  File "/spshared/apps/cryosparc4/cryosparc_master/cryosparc_command/command_core/__init__.py", line 6132, in get_job
    raise ValueError(f"{project_uid} {job_uid} does not exist.")
ValueError: P13  does not exist.

[spuser@spgpu CS-tfs-workflow-validataion]$ cryosparcm cli "get_job('$csprojectid', '$csjobid', 'job_type', 'version', 'instance_information', 'params_spec')"
{'_id': '66fd6c70220c6fa35f0ec16b', 'instance_information': {'available_memory': '117.21GB', 'cpu_model': 'Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz', 'ofd_hard_limit': 262144, 'ofd_soft_limit': 1024, 'physical_cores': 24, 'platform_architecture': 'x86_64', 'platform_node': 'spgpu', 'platform_release': '4.18.0-477.15.1.el8_8.x86_64', 'platform_version': '#1 SMP Wed Jun 28 15:04:18 UTC 2023', 'total_memory': '251.00GB', 'used_memory': '132.00GB'}, 'job_type': 'import_movies', 'params_spec': {'accel_kv': {'value': 100}, 'blob_paths': {'value': '/data/UserData/spuser/DataValidation/Images-Disc1/GridSquare_*/Data/*eer'}, 'cs_mm': {'value': 1.6}, 'eer_upsamp_factor': {'value': 1}, 'gainref_path': {'value': '/data/UserData/spuser/EPU_Preference_SAT_Settings.sxml'}, 'psize_A': {'value': 0.69}, 'total_dose_e_per_A2': {'value': 40}}, 'project_uid': 'P13', 'uid': 'J1', 'version': 'v4.6.0'}
[spuser@spgpu CS-tfs-workflow-validataion]$ free -h
              total        used        free      shared  buff/cache   available
Mem:          251Gi       133Gi       1.2Gi        39Mi       116Gi       116Gi
Swap:          31Gi       1.0Mi        31Gi

There was a typo in the command. Please can you try:

cryosparcm joblog P13 J1 | tail -n 40

Sorry, this looks a bit better.

[spuser@spgpu CS-tfs-workflow-validataion]$ cryosparcm joblog P13 J1 | tail -n 40
/spshared/apps/cryosparc4/cryosparc_worker/cryosparc_compute/ioengine/core.so(_ZN3Fei11Acquisition9EerReader7EerFile15GetNextEerFrameEv+0x4b)[0x7fc1501c21cb]
/spshared/apps/cryosparc4/cryosparc_worker/cryosparc_compute/ioengine/core.so(_ZN33ElectronCountedFramesDecompressor11prepareReadEv+0x4a)[0x7fc1501c280a]
/spshared/apps/cryosparc4/cryosparc_worker/cryosparc_compute/ioengine/core.so(_ZN33ElectronCountedFramesDecompressorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x282)[0x7fc1501c3012]
/spshared/apps/cryosparc4/cryosparc_worker/cryosparc_compute/ioengine/core.so(eer_codec_create+0x6c)[0x7fc1501c322c]
/spshared/apps/cryosparc4/cryosparc_worker/cryosparc_compute/ioengine/core.so(eer_get_inherent_size+0x1d)[0x7fc1501baacd]
/spshared/apps/cryosparc4/cryosparc_worker/cryosparc_compute/ioengine/core.so(wrap_eer_get_inherent_size+0xc1)[0x7fc1501df111]
python(+0x1445a6)[0x564487bfb5a6]
python(_PyObject_MakeTpCall+0x26b)[0x564487bf4a6b]
python(_PyEval_EvalFrameDefault+0x54a6)[0x564487bf09d6]
python(_PyFunction_Vectorcall+0x6c)[0x564487bfba2c]
python(_PyEval_EvalFrameDefault+0x4c12)[0x564487bf0142]
python(_PyFunction_Vectorcall+0x6c)[0x564487bfba2c]
python(_PyEval_EvalFrameDefault+0x13ca)[0x564487bec8fa]
python(_PyFunction_Vectorcall+0x6c)[0x564487bfba2c]
/spshared/apps/cryosparc4/cryosparc_worker/cryosparc_compute/run.cpython-310-x86_64-linux-gnu.so(+0x20e91)[0x7fc166e19e91]
/spshared/apps/cryosparc4/cryosparc_worker/cryosparc_compute/run.cpython-310-x86_64-linux-gnu.so(+0x12c31)[0x7fc166e0bc31]
python(_PyEval_EvalFrameDefault+0x4c12)[0x564487bf0142]
python(+0x1d7c60)[0x564487c8ec60]
python(PyEval_EvalCode+0x87)[0x564487c8eba7]
python(+0x20812a)[0x564487cbf12a]
python(+0x203523)[0x564487cba523]
python(PyRun_StringFlags+0x7d)[0x564487cb291d]
python(PyRun_SimpleStringFlags+0x3c)[0x564487cb275c]
python(Py_RunMain+0x26b)[0x564487cb166b]
python(Py_BytesMain+0x37)[0x564487c821f7]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x7fc165cb5d85]
python(+0x1cb0f1)[0x564487c820f1]
rax 0000000000000001 rbx 000000000000fdf0 rcx 0000000000000001 rdx 000056448b9829d0
rsi 000000000000fdf0 rdi 000056448cfbd9b0 rbp 000056448c593f60 rsp 00007ffc2f432110
r8 00007fc154101bb0 r9 0000000000000800 r10 0000000000000007 r11 0000000000000006
r12 0000000000000001 r13 000056448cfbd9b0 r14 00007ffc2f432140 r15 000056448cfbd9b0
00 00 83 f8 2f 0f 87 13 10 00 00 89 c2 83 c0 08 49 03 56 10 41 89 06 48 8b 12 66 89 0a
83 f8 2f 0f 87 b7 0f 00 00 89 c2 83 c0 08 49 03 56 10 41 89 06 48 8b 02 49 8b 54 24 10
41 bc 01 00 00 00
→ 48 89 10 eb 3e 0f 1f 00 41 8b 95 90 01 00 00 85 d2 7e 2d 49 8b 85 98 01 00 00 ff
ca 48 8d 14 52 48 8d 4c d0 18 48 8b 10 81 3a 4e 01 00 00 74 28 48 83 c0 18 48 39 c8 75
ec 0f 1f 80 00 00 00 00

========= main process now complete at 2024-10-02 10:58:24.358552.
========= monitor process now complete at 2024-10-02 10:58:24.407053.

@lmthomas Please can you try two other Import Movies jobs where you specify different Movies data path parameters, respectively such that the jobs import non-overlapping subsets of the dataset whose import failed in J1.
Do both jobs also fail. For of the failing jobs, please post the outputs of the commands

csprojectid=P99 # replace with actual project ID
csjobid=J199 # replace with actual job id
cryosparcm eventlog $csprojectid $csjobid | tail -n 20
cryosparcm joblog $csprojectid $csjobid | tail -n 60
cryosparcm cli "get_job('$csprojectid', '$csjobid', 'job_type', 'version', 'instance_information', 'params_spec', errors_run')"