Hi @hsnyder & @mmclean, We are having the same issue on on of our systems - on a 2080Ti, where we previously had no issues, Patch Motion always fails on super-res K3 data, even with Low Memory mode set. Nvidia driver version is 525.60.13 according to nvidia-smi.
This is a new job, not cloned, CS v4.4.0. Here is the error message:
Error occurred while processing J509/imported/003325998359943907799_23dec20b_2_00004gr_00035sq_v03_00002hln_00003enn.frames.tif
Traceback (most recent call last):
File "/home/user/software/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/pipeline.py", line 61, in exec
return self.process(item)
File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 192, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 195, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 224, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 201, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 292, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 710, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 188, in cryosparc_master.cryosparc_compute.gpu.gpucore.transfer_ndarray_to_cudaarray
File "/home/user/software/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 232, in _require_cuda_context
return fn(*args, **kws)
File "/home/user/software/cryosparc/cryosparc2_worker/cryosparc_compute/gpu/driver.py", line 151, in create_array
handle = allocator()
File "/home/user/software/cryosparc/cryosparc2_worker/cryosparc_compute/gpu/driver.py", line 137, in <lambda>
allocator = lambda: cuda_check_error(cuda.cuArrayCreate(desc), "Could not allocate GPU array")
File "/home/user/software/cryosparc/cryosparc2_worker/cryosparc_compute/gpu/driver.py", line 265, in cuda_check_error
raise RuntimeError(f"{msg}: {err.name}")
RuntimeError: Could not allocate GPU array: CUDA_ERROR_OUT_OF_MEMORY
Marking J509/imported/003325998359943907799_23dec20b_2_00004gr_00035sq_v03_00002hln_00003enn.frames.tif as incomplete and continuing...
Thanks Harris - is there a patch for the meantime to revert Patch Motion to the previous version? Or any suggestion for reducing memory requirements beyond just using the low memory mode?
Unfortunately not… The change that caused this had nothing to do with patch motion specifically, it was the changes involved in shipping our own cuda version. The workaround would be to downgrade CryoSPARC versions.
After talking to our neighboring lab, they have been having the same issue but have been working around it by Fourier cropping to 1/2. This worked for us!
Our neighboring lab also recommended overriding the knots (Z=5, Y=5, X=7) as well as cropping to 1/2, but we found with our data only the cropping was necessary. Both our lab and their lab are using super-res K3 data.
Maybe has to do with the number of movie frames? For us (50 frame super-res K3 movies on 2080Ti cards) it only works with X=6, Y=4, low memory mode, F-crop=1/2. Any more knots, or switching off low memory mode, or altered, F-crop, and it crashes. Glad to have a workaround!
EDIT: I spoke too soon - it ran ok for 15 mics and then started failing again Back to tweaking params
Hi @olibclarke, I have a potential workaround that may address this. For background, v4.4 includes a new GPU memory management system (using the numba Python library) that does not immediately free memory when it’s no longer required. Instead, it frees in batches or when memory is low.
Your Patch Motion job appears to fail during a special allocation step that is unaware of this memory management system. So there may be some GPU memory that could be freed to make this work.
We’ll should have a fix for this in a future version of CryoSPARC, but in the mean time you could try disabling batched-memory deallocation by adding the following line to cryosparc_worker/config.sh:
export NUMBA_CUDA_MAX_PENDING_DEALLOCS_COUNT=0
Let me know if you get a chance to try this and it works for you.
@AlexHouser I am not sure whether the same fix will apply to you; based on the text, the error appears to be coming from a different place that is correctly managed. We are still investigating other memory usage changes in v4.4.
I am using CS4.4.1. I tried adding the above mentioned line in the cryosparc_worker/config.sh. But the problem did not solved, showing up same as mentioned by @olibclarke.
Thank you for your reply. After reading this thread, I realized that, I am using GPU with 8 GB VRAM. I used the same patch correction job by using F-crop = 1/4, and it worked for me. In this also, I noticed that both of my GPUs were full in use. After reading some suggestion above, that patch motion correction is working fine in CS v4.3.1 or below. Is it better for me to rollback to CS v4.3.1 for my set of hardware configurations or else I have to use the F-crop =1/4? I have attached the nvidia-smi output below:
Adding that line to cryosparc_work/config.sh did not fix it for us either. Interestingly we are no longer able to fix it with Fourier cropping, low memory mode, and overriding knots during Patch Motion correction for our latest data set collected. We ended up having to roll back to v4.3.1.