PatchMotion failure 2.13.2

open

#1

Hello All,

I am experiencing this error message when trying to do patch motion with the tutorial dataset.

I have updated my Cuda environment to 9.1 and or 10.0.
I have tried to kill any ghost jobs with ps -ax | grep “supervisord”

Any help is appreciated.

[CPU: 86.2 MB]   --------------------------------------------------------------

[CPU: 86.2 MB]   Importing job module for job type patch_motion_correction_multi...

[CPU: 165.0 MB]  Job ready to run

[CPU: 165.0 MB]  ***************************************************************

[CPU: 165.3 MB]  Job will process this many movies:  20

[CPU: 165.3 MB]  parent process is 2778287

[CPU: 133.3 MB]  Calling CUDA init from 2778321

[CPU: 133.3 MB]  Calling CUDA init from 2778324

[CPU: 133.3 MB]  Calling CUDA init from 2778323

[CPU: 133.3 MB]  Calling CUDA init from 2778322

[CPU: 165.6 MB]  Outputting partial results now...

[CPU: 165.6 MB]  Traceback (most recent call last):
  File "cryosparc2_master/cryosparc2_compute/run.py", line 78, in cryosparc2_compute.run.main
  File "cryosparc2_master/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 349, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 2778321 has terminated unexpectedly!

#2

Hi @ElyseF,

Can you try the following:

  1. run the same job with only one GPU and see what error messages appear
  2. run a “Rigid Motion Correction” job (not multi-GPU) and see if any errors show up

Unfortunately the true error message from the subprocess that is failing is not showing up in the patch-motion multi-GPU job.


#3

I am seeing similar failure.

[CPU: 906.4 MB] Traceback (most recent call last):
File “cryosparc2_compute/jobs/runcommon.py”, line 1547, in run_with_except_hook
run_old(*args, **kw)
File “/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “cryosparc2_compute/jobs/pipeline.py”, line 153, in thread_work
work = processor.process(item)
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 157, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 160, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 161, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py”, line 77, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py”, line 393, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File “cryosparc2_worker/cryosparc2_compute/engine/newgfourier.py”, line 22, in cryosparc2_compute.engine.newgfourier.get_plan_R2C_2D
File “/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py”, line 127, in init
onembed, ostride, odist, self.fft_type, self.batch)
File “/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py”, line 742, in cufftMakePlanMany
cufftCheckStatus(status)
File “/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py”, line 117, in cufftCheckStatus
raise e
cufftAllocFailed

[CPU: 196.2 MB] Outputting partial results now…

[CPU: 181.1 MB] Traceback (most recent call last):
File “cryosparc2_worker/cryosparc2_compute/run.py”, line 78, in cryosparc2_compute.run.main
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 349, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 14219 has terminated unexpectedly!


#4

Also motioncore2 works fine. Seems a memory issue as these are full pixel K3 images


#5

We’re also experiencing gpu memory problems with Patch Motion Correction in v2.13.2 on K3 super resolution movies, however, we also notice different success between a workstation with 4x2080Ti (fails immediately unless 1/4 cropped) and one with 3x1080Ti (can run without any cropping although fails if run on the same gpu as the X server is on). Data is super res with pixel size 0.826 (super res 0.413) Å/pixel. The workstations (centos 7) have different hardware but cuda, drivers, kernel are all the same. The only difference in cryosparc is the 1080Ti workstation has cryosparc live and the 2080Ti one not. It fails on the first job with the below error message. Can provide more information if needed. Thanks.

File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 157, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 160, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 161, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py”, line 77, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py”, line 446, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 197, in cryosparc2_compute.engine.cuda_core.transfer_ndarray_to_cudaarray
MemoryError: cuArrayCreate failed: out of memory