Patch Motion Correction Fails

Hi,

I updated my nvidia driver to 435.21 and tried executing the patch motion correction job and I received the following error messages. The issue seems to be only with the Patch motion correction job as I am able to run full frame motion correction fine.

This is with the tutorial dataset and I even tried a cropping of 1/4 as suggested from a previous post. The system has 2 Quadro P4000 GPUs and 256 GB ram.

Thanks,
Sundhar

License is valid.
Launching job on lane default target 
Running job on master node 
Project P1 Job J5 Started
Master running v2.12.4, worker running v2.12.4
Running on lane default
Resources allocated: 
  Worker:  cryoem2.bch.msu.edu
  CPU   :  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
  GPU   :  [0, 1]
  RAM   :  [0, 1, 2, 3]
  SSD   :  False
--------------------------------------------------------------

Importing job module for job type patch_motion_correction_multi...
Job ready to run

***************************************************************

Job will process this many movies:  20
parent process is 6624
Calling CUDA init from 6663
Calling CUDA init from 6664

-- 1.0: processing 0 of 20: J2/imported/14sep05c_00024sq_00003hl_00002es.frames.tif
        loading /users/kparent/Test/csparc/P1/J2/imported/14sep05c_00024sq_00003hl_00002es.frames.tif
        Loading raw movie data from J2/imported/14sep05c_00024sq_00003hl_00002es.frames.tif ...
        Done in 2.68s
        Loading gain data from J2/imported/norm-amibox05-0.mrc ...
        Done in 0.11s
        Processing ...

-- 0.0: processing 1 of 20: J2/imported/14sep05c_00024sq_00003hl_00005es.frames.tif
        loading /users/kparent/Test/csparc/P1/J2/imported/14sep05c_00024sq_00003hl_00005es.frames.tif
        Loading raw movie data from J2/imported/14sep05c_00024sq_00003hl_00005es.frames.tif ...
        Done in 2.71s
        Loading gain data from J2/imported/norm-amibox05-0.mrc ...
        Done in 0.09s
        Processing ...

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook
    run_old(*args, **kw)
  File "/users/kparent/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
    work = processor.process(item)
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 154, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 157, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 158, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py", line 77, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py", line 393, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
  File "cryosparc2_worker/cryosparc2_compute/engine/newgfourier.py", line 22, in cryosparc2_compute.engine.newgfourier.get_plan_R2C_2D
  File "/users/kparent/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py", line 127, in __init__
    onembed, ostride, odist, self.fft_type, self.batch)
  File "/users/kparent/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 742, in cufftMakePlanMany
    cufftCheckStatus(status)
  File "/users/kparent/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 117, in cufftCheckStatus
    raise e
cufftAllocFailed

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook
    run_old(*args, **kw)
  File "/users/kparent/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
    work = processor.process(item)
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 154, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 157, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 158, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py", line 77, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py", line 393, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
  File "cryosparc2_worker/cryosparc2_compute/engine/newgfourier.py", line 22, in cryosparc2_compute.engine.newgfourier.get_plan_R2C_2D
  File "/users/kparent/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py", line 127, in __init__
    onembed, ostride, odist, self.fft_type, self.batch)
  File "/users/kparent/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 742, in cufftMakePlanMany
    cufftCheckStatus(status)
  File "/users/kparent/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 117, in cufftCheckStatus
    raise e
cufftAllocFailed

Outputting partial results now...

Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 78, in cryosparc2_compute.run.main
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 312, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 6663 has terminated unexpectedly!

Hi @Sundhar,
The P4000 is an 8GB GPU and unfortunately the tutorial data is K2 super-res (8Kx8K frames) which does not fit on 8GB for patch-motion. We are working on optimizing memory usage.

Hi
I am getting the same error at the end of the job.Can anyone provide suggestion for the same.

Hi @PRIYANKA, can you please provide as much info as possible so we can help troubleshoot? Before You Post: Troubleshooting Guidelines

Hi Punjani,
Thanks for the concern.My problem was resolved as there is issue with the size of one mrc file. After deleting that specific mrc file, my job ran and finished.

Thankyou

Thanks for the update @PRIYANKA!