Patch Motion Correction - RuntimeError: Could not allocate GPU array: CUDA_ERROR_OUT_OF_MEMORY

NVIDIA GeForce RTX 2080

After talking to our neighboring lab, they have been having the same issue but have been working around it by Fourier cropping to 1/2. This worked for us!

Overriding the number of knots (to X=6, Y=4) seemed to do the trick for us using super-res K3 data on a 2080Ti (using F-crop=1/2 also).

Just using F-crop=1/2 with patch motion didn’t do the trick, but reducing the number of knots as well worked

Our neighboring lab also recommended overriding the knots (Z=5, Y=5, X=7) as well as cropping to 1/2, but we found with our data only the cropping was necessary. Both our lab and their lab are using super-res K3 data.

1 Like

Maybe has to do with the number of movie frames? For us (50 frame super-res K3 movies on 2080Ti cards) it only works with X=6, Y=4, low memory mode, F-crop=1/2. Any more knots, or switching off low memory mode, or altered, F-crop, and it crashes. Glad to have a workaround!

EDIT: I spoke too soon - it ran ok for 15 mics and then started failing again :frowning: Back to tweaking params

Hi @olibclarke, I have a potential workaround that may address this. For background, v4.4 includes a new GPU memory management system (using the numba Python library) that does not immediately free memory when it’s no longer required. Instead, it frees in batches or when memory is low.

Your Patch Motion job appears to fail during a special allocation step that is unaware of this memory management system. So there may be some GPU memory that could be freed to make this work.

We’ll should have a fix for this in a future version of CryoSPARC, but in the mean time you could try disabling batched-memory deallocation by adding the following line to cryosparc_worker/config.sh:

export NUMBA_CUDA_MAX_PENDING_DEALLOCS_COUNT=0

Let me know if you get a chance to try this and it works for you.

@AlexHouser I am not sure whether the same fix will apply to you; based on the text, the error appears to be coming from a different place that is correctly managed. We are still investigating other memory usage changes in v4.4.

1 Like

Great I will give this a go, thanks!!

I am using CS4.4.1. I tried adding the above mentioned line in the cryosparc_worker/config.sh. But the problem did not solved, showing up same as mentioned by @olibclarke.

1 Like

Welcome to the forum @Suyog.

What is the output of the command
nvidia-smi
on the CryoSPARC worker?

Thank you for your reply. After reading this thread, I realized that, I am using GPU with 8 GB VRAM. I used the same patch correction job by using F-crop = 1/4, and it worked for me. In this also, I noticed that both of my GPUs were full in use. After reading some suggestion above, that patch motion correction is working fine in CS v4.3.1 or below. Is it better for me to rollback to CS v4.3.1 for my set of hardware configurations or else I have to use the F-crop =1/4? I have attached the nvidia-smi output below:

These are tough choices. For a potential alternative, have you already tried

Yes, I have tried that. Here is the screenshot of the file:

Adding that line to cryosparc_work/config.sh did not fix it for us either. Interestingly we are no longer able to fix it with Fourier cropping, low memory mode, and overriding knots during Patch Motion correction for our latest data set collected. We ended up having to roll back to v4.3.1.

1 Like

Thank you for the suggestion. We also downgraded to v4.3.1, and Patch motion correction is working fine. Thank you @AlexHouser @nfrasser @wtempel.

This problem still persists in v4.5.

Movies: superres K3, 8184x11520 px, 80 frames.
System:
Linux GPU-4X-2080Ti 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
NVIDIA-SMI 530.30.02
Driver Version: 530.30.02
CUDA Version: 12.1
GPUs: 4xRTX 2080Ti 11GB, RAM: 256GB

Using F-crop=1/2,1/8,1/16 and or different number of knots doesn’t help.
Adding NUMBA_CUDA_MAX_PENDING_DEALLOCS_COUNT=0 also doesn’t change anything.
The same task used to run fine in v4.3.

Here is the full ouput:

Traceback (most recent call last):
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 851, in _attempt_allocation
return allocator()
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1054, in allocator
return driver.cuMemAlloc(size)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 348, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 408, in _check_cuda_python_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/eugene/cryosparc/cryosparc_worker/cryosparc_compute/jobs/pipeline.py”, line 59, in exec
return self.process(item)
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 210, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 213, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 242, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 219, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py”, line 292, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py”, line 628, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File “cryosparc_master/cryosparc_compute/gpu/gpucore.py”, line 390, in cryosparc_master.cryosparc_compute.gpu.gpucore.EngineBaseThread.ensure_allocated
File “/home/eugene/cryosparc/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py”, line 270, in empty
return device_array(shape, dtype, stream=stream)
File “/home/eugene/cryosparc/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py”, line 226, in device_array
arr = GPUArray(shape=shape, strides=strides, dtype=dtype, stream=stream)
File “/home/eugene/cryosparc/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py”, line 21, in init
super().init(shape, strides, dtype, stream, gpu_data)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devicearray.py”, line 103, in init
gpu_data = devices.get_context().memalloc(self.alloc_size)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1372, in memalloc
return self.memory_manager.memalloc(bytesize)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1056, in memalloc
ptr = self._attempt_allocation(allocator)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 863, in _attempt_allocation
return allocator()
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1054, in allocator
return driver.cuMemAlloc(size)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 348, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 408, in _check_cuda_python_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

Was the job queued to

  1. submitted to an external workload manager (like slurm)?
  2. or submitted to the CryoSPARC-builtin cluster manager
  3. launched directly on GPU(s)?

Do non-CryoSPARC applications or jobs from another CryoSPARC instance also use the GPUs on this host?

Is any of these failed attempts an exact clone

  • same worker
  • same data
  • same parameters

of a successful CryoSPARC v4.3 job?

If not, please can you confirm that a job with the same worker, data and parameters does not fail after downgrading (preprequisites, downgrade instructions) your instance to v4.3.1

cryosparcm update --version=v4.3.1

Until the issue is resolved, you may want to preserve the failed jobs for comparison (that is, neither delete nor re-run them).

We are experiencing this error when running Patch Motion Correction on a multi-GPU job. With CryoSPARC Live (v4.5.3), everything functions correctly, even with multiple preprocessing workers. However, once the job is started in the workspace, all multi-GPU jobs result in a CUDA_ERROR_OUT_OF_MEMORY. Currently, I am running a single-GPU job, and this error has not occurred. Additionally, the low-memory option does not seem to have any effect. It would be helpful if this issue could be resolved. Before upgrading to CryoSPARC 4.5, everything worked as expected.

@dzyla What is the output of the command

nvidia-smi --query-gpu=index,name --format=csv

on the affected worker(s)?

The result is:

workstation 1:

index, name
0, NVIDIA GeForce RTX 3070
1, NVIDIA GeForce RTX 3070
2, NVIDIA GeForce RTX 3070
3, NVIDIA GeForce RTX 3070

workstation 2:

index, name
0, NVIDIA GeForce RTX 2080 Ti
1, NVIDIA GeForce RTX 2080 Ti
2, NVIDIA GeForce RTX 2080 Ti
3, NVIDIA GeForce RTX 2080 Ti

Both worked well previously, and we have never had issues with this error.

@dzyla We expect a modest increase in VRAM usage after an upgrade to CryoSPARC v4.4+. With GPUs with VRAM sizes below or barely at the (by now fairly dated) minimum recommendation of 11 GB, certain job types may may fail due to insufficient VRAM. We are considering an increase in the minimum VRAM recommendation for recent versions of CryoSPARC.