I am experiencing this error message when trying to do patch motion with the tutorial dataset.
I have updated my Cuda environment to 9.1 and or 10.0.
I have tried to kill any ghost jobs with ps -ax | grep “supervisord”
Any help is appreciated.
[CPU: 86.2 MB] --------------------------------------------------------------
[CPU: 86.2 MB] Importing job module for job type patch_motion_correction_multi...
[CPU: 165.0 MB] Job ready to run
[CPU: 165.0 MB] ***************************************************************
[CPU: 165.3 MB] Job will process this many movies: 20
[CPU: 165.3 MB] parent process is 2778287
[CPU: 133.3 MB] Calling CUDA init from 2778321
[CPU: 133.3 MB] Calling CUDA init from 2778324
[CPU: 133.3 MB] Calling CUDA init from 2778323
[CPU: 133.3 MB] Calling CUDA init from 2778322
[CPU: 165.6 MB] Outputting partial results now...
[CPU: 165.6 MB] Traceback (most recent call last):
File "cryosparc2_master/cryosparc2_compute/run.py", line 78, in cryosparc2_compute.run.main
File "cryosparc2_master/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 349, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 2778321 has terminated unexpectedly!
[CPU: 906.4 MB] Traceback (most recent call last):
File "cryosparc2_compute/jobs/runcommon.py", line 1547, in run_with_except_hook
run_old(*args, **kw)
File "/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
work = processor.process(item)
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 157, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 160, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 161, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py", line 77, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py", line 393, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File "cryosparc2_worker/cryosparc2_compute/engine/newgfourier.py", line 22, in cryosparc2_compute.engine.newgfourier.get_plan_R2C_2D
File "/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py", line 127, in __init__
onembed, ostride, odist, self.fft_type, self.batch)
File "/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 742, in cufftMakePlanMany
cufftCheckStatus(status)
File "/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 117, in cufftCheckStatus
raise e
cufftAllocFailed
[CPU: 196.2 MB] Outputting partial results now...
[CPU: 181.1 MB] Traceback (most recent call last):
File "cryosparc2_worker/cryosparc2_compute/run.py", line 78, in cryosparc2_compute.run.main
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 349, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 14219 has terminated unexpectedly!
We’re also experiencing gpu memory problems with Patch Motion Correction in v2.13.2 on K3 super resolution movies, however, we also notice different success between a workstation with 4x2080Ti (fails immediately unless 1/4 cropped) and one with 3x1080Ti (can run without any cropping although fails if run on the same gpu as the X server is on). Data is super res with pixel size 0.826 (super res 0.413) Å/pixel. The workstations (centos 7) have different hardware but cuda, drivers, kernel are all the same. The only difference in cryosparc is the 1080Ti workstation has cryosparc live and the 2080Ti one not. It fails on the first job with the below error message. Can provide more information if needed. Thanks.
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 157, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 160, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 161, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py", line 77, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py", line 446, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 197, in cryosparc2_compute.engine.cuda_core.transfer_ndarray_to_cudaarray
MemoryError: cuArrayCreate failed: out of memory
We have some K3 test data for this issue and are trying to optimize the memory layout on GPU but it will be very helpful to have more data that is known to fail on 1080Ti/2080Ti.
Would either of you be able to share ~ a dozen troublesome movies?
I’ll email with upload instructions.
We received 12 files, thank you! Are you able to upload a file with microscope parameters (pixel size, total dose rate, accellerating voltage, spherical abberation mm)? Also, is a gain reference file necessary?
We ran some tests with @MHB’s data on an 11GB 1080 Ti and measured a max memory requirement of around 10GB. We get the same allocation error if another process uses more than ~1.5 GB while patch motion runs.
Does that align with what everyone else is seeing? For anyone still experiencing this issue, can you post the output of nvidia-smi just before the Patch Motion job runs?
I am running on a worker node. I get the same error if i run on GPU 0 or GPU 1. Also i cannot choose the GPU. I get this error with no option to choose the GPU.
Looks like there’s something different in your 1080Ti GPU configuration compared to ours. Can you send us a full listing of your GPU information by running this shell command on the worker?
bash -c 'eval $(cryosparcw env) && python -c "import pycuda.driver as pycu; pycu.init(); print [(pycu.Device(i).name(), pycu.Device(i).compute_capability(), pycu.Device(i).total_memory(), pycu.Device(i).get_attributes()) for i in range(pycu.Device.count())]"'
Copy the full output and paste it here (it will not contain any personally identifiable information).
My apologies for all the back-and-forth, hopefully all this information will help us resolve the issue for you and all other cryoSPARC users experiencing this.