Reference-based motion correction crashing

Hi all,

I have extracted particles from 2 datasets in relion and loaded into cryoSPARC for further analysis. When I got to the point of using RBMC, RBMC always crash in one of the datasets with error message "
====== Job process terminated abnormally."

After re-running several times and checking job.log, one time the error message was assertion error: nmov > movie_no. This time it showed gain_ref_blob/flip_x: invalid handle 1125899906842637, no heap at index 13 (errno 2: No such file or directory).

I have checked that all soft links to the movies are not broken. I also tried to remove the movie that may cause the crash from processing, but of no avail. Any suggestions to troubleshoot?

1 Like

@kpsleung Please can you post the version and patch level for you CryoSPARC installation, and the complete error trace.

The version is v4.4.1. The following is the end of the job.log:

ElectronCountedFramesDecompressor::prepareRead: found 1085 frames in EER-TIFF file.

refmotion worker 0 (NVIDIA RTX A4000)
BFGS iterations:      300
scale (alpha):        4.861825
noise model (sigma2): 39.205906
     TIME (s)  SECTION
noise model (sigma2): 39.205906
     TIME (s)  SECTION
  0.000145032  sanity
  9.167947948  read movie
  0.045142473  get gain, defects
  0.060268143  read bg
  0.002517539  read rigid
  0.893805941  prep_movie
  0.407442085  extract from frames
  0.000571716  extract from refs
  0.000000474  adj
  0.000000150  bfactor
  0.007225606  rigid motion correct
  0.000213075  get noise, scale
  0.738295133  optimize trajectory
  0.155823695  shift_sum patches
  0.002365920  ifft
  0.001661418  unpad
  0.000250805  fill out dataset
  0.011622373  write output files
 11.495299527  --- TOTAL ---

ElectronCountedFramesDecompressor: reading using TIFF-EER mode.
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered     (repeated a lot of lines)
ElectronCountedFramesDecompressor::prepareRead: found 1085 frames in EER-TIFF file.
gain_ref_blob/flip_x: invalid handle 1125899906842637, no heap at index 13 (errno 2: No such file or directory)
========= main process now complete at 2024-01-12 18:16:10.201676.
========= monitor process now complete at 2024-01-12 18:16:20.703899.

@wtempel here is another set of error messages from job.log in the recent crash:

ElectronCountedFramesDecompressor: reading using TIFF-EER mode.
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered       [repeat a lot of times]
HOST ALLOCATION FUNCTION: using numba.cuda.pinned_array
**** handle exception rc
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
/home/wcyl/cryosparc_worker/cryosparc_compute/jobs/motioncorrection/ NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See for details.
/home/wcyl/cryosparc_worker/cryosparc_compute/ NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See for details.
  def contrast_normalization(arr_bin, tile_size = 128):
/home/wcyl/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/ UserWarning: NVRTC log messages whilst compiling kernel:

kernel(963): warning #177-D: variable "Nb2p1" was declared but never referenced

/home/wcyl/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/multiprocessing/ UserWarning: Kernel function slice_volume called with very small array <cryosparc_compute.gpu.gpuarray.GPUArray object at 0x7f0dd8252310> (size 2). Array will be passed to kernel as a pointer. Consider modifying the kernel to accept individual scalar arguments instead.
  self._target(*self._args, **self._kwargs)

[this message repeats a lot of times, where the addresses start 0x7... varies and size varies from 1 to 3]
/home/wcyl/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/ RuntimeWarning: A builtin ctypes object gave a PEP3118 format string that does not match its itemsize, so a best-guess will be made of the data type. Newer versions of python may behave correctly.
  return asarray(obj)
Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/", line 95, in
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/", line 495, in
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/", line 1220, in
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/", line 1235, in
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/", line 610, in
  File "/home/wcyl/cryosparc_worker/cryosparc_compute/", line 70, in init
    self.N_input = int(self[self.blob_key + '/shape'][0,0])
IndexError: index 0 is out of bounds for axis 0 with size 0
set status to failed
========= main process now complete at 2024-01-13 22:31:46.834807.
========= monitor process now complete at 2024-01-13 22:31:46.840172.

Hi @kpsleung. Are you still experiencing these issues? And just to clarify, you’re getting different error messages each re-run, even if you just re-run the job without changing any parameters or inputs?

Hi @hsnyder. Running the same job with the same input and parameters gave different error messages. For now, it seems that the problem can be smoothened out by redo extraction and remove any particles on edge.

Hi folks, I have gotten a similar error with RBMC a couple times now. Here’s all the job.log says below. Any thoughts?

**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/", line 115, in
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/", line 469, in
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/", line 1259, in
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/", line 1275, in
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/", line 637, in
  File "/opt/applications/cryosparc2_worker/cryosparc_compute/", line 70, in init
    self.N_input = int(self[self.blob_key + '/shape'][0,0])
IndexError: index 0 is out of bounds for axis 0 with size 0
srun: error: emnoded38: task 0: Exited with exit code 1

Do you get this specific error

at the same point of job progress every time you clone the failed job and run the failed job’s clone?

Not sure, I’ll test and report back

Hi @wtempel, I cloned and re-ran the job, it failed at a different point with the same error. The first run failed at 1% progress into computing empirical dose weights, the cloned second run failed at 92% progress into motion correcting particles (oof, it almost finished).

Here are the settings I used for the job, in case it’s helpful:
Save results in 16-bit floating point ON
Skip movies with wrong frame count ON
Fourier crop to box size 400 ON (super-res movies)
Parallelized over 4 GPUs

All other settings were default