Hello All,
I am running into some weird issues while running the Patch Motion job. Midway through the job after some successful processing of images, it fails with some cuda error which I do not understand.
Here is a part of the event log.
##################################
[CPU: 3.78 GB]
-- 0.0: processing 42 of 1000: J11/imported/009991550500362512406_m22nov25a_g_00013gr_00065sq940_v01_00005hl_00009enn.frames.tif
loading /home/uwm/tmalla/Data/tmalla/Work/NYSBC/2022/NOV/cryosparc-jobs/CS-phytochrome-pcm/J11/imported/009991550500362512406_m22nov25a_g_00013gr_00065sq940_v01_00005hl_00009enn.frames.tif
Loading raw movie data from J11/imported/009991550500362512406_m22nov25a_g_00013gr_00065sq940_v01_00005hl_00009enn.frames.tif ...
Done in 14.45s
Loading gain data from J11/imported/m22nov25a_25123045_01_8184x11520_norm_0.mrc ...
Done in 0.00s
Processing ...
[CPU: 4.14 GB]
-- 0.0: processing 43 of 1000: J11/imported/003194348880911734508_m22nov25a_g_00013gr_00065sq940_v01_00005hl_00010enn.frames.tif
loading /home/2022/NOV/cryosparc-jobs/CS-pcm/J11/imported/003194348880911734508_m22nov25a_g_00013gr_00065sq940_v01_00005hl_00010enn.frames.tif
Loading raw movie data from J11/imported/003194348880911734508_m22nov25a_g_00013gr_00065sq940_v01_00005hl_00010enn.frames.tif ...
Done in 16.83s
Loading gain data from J11/imported/m22nov25a_25123045_01_8184x11520_norm_0.mrc ...
Done in 0.00s
Processing ...
[CPU: 367.1 MB]
Error occurred while processing J11/imported/009991550500362512406_m22nov25a_g_00013gr_00065sq940_v01_00005hl_00009enn.frames.tif
Traceback (most recent call last):
File "/tank/data/Programs/cryosparc/cryosparc_worker/cryosparc_compute/jobs/pipeline.py", line 60, in exec
return self.process(item)
File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 177, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 180, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 182, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 255, in cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 669, in cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 313, in cryosparc_compute.engine.cuda_core.EngineBaseThread.toc
File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 309, in cryosparc_compute.engine.cuda_core.EngineBaseThread.wait
pycuda._driver.LogicError: cuStreamSynchronize failed: an illegal memory access was encountered
Marking J11/imported/009991550500362512406_m22nov25a_g_00013gr_00065sq940_v01_00005hl_00009enn.frames.tif as incomplete and continuing...
#############################################################
I have separated the raw images into blocks of 1000 images each. And I send the job parallely. Some jobs continue to completion, while some fail.