Reference-based motion correction crashing

Hi all,

I have extracted particles from 2 datasets in relion and loaded into cryoSPARC for further analysis. When I got to the point of using RBMC, RBMC always crash in one of the datasets with error message "
====== Job process terminated abnormally."

After re-running several times and checking job.log, one time the error message was assertion error: nmov > movie_no. This time it showed gain_ref_blob/flip_x: invalid handle 1125899906842637, no heap at index 13 (errno 2: No such file or directory).

I have checked that all soft links to the movies are not broken. I also tried to remove the movie that may cause the crash from processing, but of no avail. Any suggestions to troubleshoot?

1 Like

@kpsleung Please can you post the version and patch level for you CryoSPARC installation, and the complete error trace.

The version is v4.4.1. The following is the end of the job.log:

ElectronCountedFramesDecompressor::prepareRead: found 1085 frames in EER-TIFF file.

refmotion worker 0 (NVIDIA RTX A4000)
BFGS iterations:      300
scale (alpha):        4.861825
noise model (sigma2): 39.205906
     TIME (s)  SECTION
noise model (sigma2): 39.205906
     TIME (s)  SECTION
  0.000145032  sanity
  9.167947948  read movie
  0.045142473  get gain, defects
  0.060268143  read bg
  0.002517539  read rigid
  0.893805941  prep_movie
  0.407442085  extract from frames
  0.000571716  extract from refs
  0.000000474  adj
  0.000000150  bfactor
  0.007225606  rigid motion correct
  0.000213075  get noise, scale
  0.738295133  optimize trajectory
  0.155823695  shift_sum patches
  0.002365920  ifft
  0.001661418  unpad
  0.000250805  fill out dataset
  0.011622373  write output files
 11.495299527  --- TOTAL ---

ElectronCountedFramesDecompressor: reading using TIFF-EER mode.
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered     (repeated a lot of lines)
ElectronCountedFramesDecompressor::prepareRead: found 1085 frames in EER-TIFF file.
gain_ref_blob/flip_x: invalid handle 1125899906842637, no heap at index 13 (errno 2: No such file or directory)
========= main process now complete at 2024-01-12 18:16:10.201676.
========= monitor process now complete at 2024-01-12 18:16:20.703899.

@wtempel here is another set of error messages from job.log in the recent crash:

ElectronCountedFramesDecompressor: reading using TIFF-EER mode.
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered       [repeat a lot of times]
HOST ALLOCATION FUNCTION: using numba.cuda.pinned_array
**** handle exception rc
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
/home/wcyl/cryosparc_worker/cryosparc_compute/jobs/motioncorrection/mic_utils.py:95: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @jit(nogil=True)
/home/wcyl/cryosparc_worker/cryosparc_compute/micrographs.py:563: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def contrast_normalization(arr_bin, tile_size = 128):
/home/wcyl/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py:2919: UserWarning: NVRTC log messages whilst compiling kernel:

kernel(963): warning #177-D: variable "Nb2p1" was declared but never referenced


  warnings.warn(msg)
/home/wcyl/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/multiprocessing/process.py:108: UserWarning: Kernel function slice_volume called with very small array <cryosparc_compute.gpu.gpuarray.GPUArray object at 0x7f0dd8252310> (size 2). Array will be passed to kernel as a pointer. Consider modifying the kernel to accept individual scalar arguments instead.
  self._target(*self._args, **self._kwargs)

[this message repeats a lot of times, where the addresses start 0x7... varies and size varies from 1 to 3]
 
/home/wcyl/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/ctypeslib.py:518: RuntimeWarning: A builtin ctypes object gave a PEP3118 format string that does not match its itemsize, so a best-guess will be made of the data type. Newer versions of python may behave correctly.
  return asarray(obj)
Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 95, in cryosparc_master.cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_reference_motion.py", line 495, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_reference_motion.run_reference_motion_correction
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/refmotion.py", line 1220, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.refmotion.mainfn_reconstruct
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/refmotion.py", line 1235, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.refmotion.mainfn_reconstruct
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/refmotion.py", line 610, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.refmotion.slice_vol
  File "/home/wcyl/cryosparc_worker/cryosparc_compute/particles.py", line 70, in init
    self.N_input = int(self[self.blob_key + '/shape'][0,0])
IndexError: index 0 is out of bounds for axis 0 with size 0
set status to failed
========= main process now complete at 2024-01-13 22:31:46.834807.
========= monitor process now complete at 2024-01-13 22:31:46.840172.

Hi @kpsleung. Are you still experiencing these issues? And just to clarify, you’re getting different error messages each re-run, even if you just re-run the job without changing any parameters or inputs?

Hi @hsnyder. Running the same job with the same input and parameters gave different error messages. For now, it seems that the problem can be smoothened out by redo extraction and remove any particles on edge.