Patch Motion Correction - RuntimeError: Could not allocate GPU array: CUDA_ERROR_OUT_OF_MEMORY

This problem still persists in v4.5.

Movies: superres K3, 8184x11520 px, 80 frames.
System:
Linux GPU-4X-2080Ti 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
NVIDIA-SMI 530.30.02
Driver Version: 530.30.02
CUDA Version: 12.1
GPUs: 4xRTX 2080Ti 11GB, RAM: 256GB

Using F-crop=1/2,1/8,1/16 and or different number of knots doesn’t help.
Adding NUMBA_CUDA_MAX_PENDING_DEALLOCS_COUNT=0 also doesn’t change anything.
The same task used to run fine in v4.3.

Here is the full ouput:

Traceback (most recent call last):
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 851, in _attempt_allocation
return allocator()
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1054, in allocator
return driver.cuMemAlloc(size)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 348, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 408, in _check_cuda_python_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/eugene/cryosparc/cryosparc_worker/cryosparc_compute/jobs/pipeline.py”, line 59, in exec
return self.process(item)
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 210, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 213, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 242, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 219, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py”, line 292, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py”, line 628, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File “cryosparc_master/cryosparc_compute/gpu/gpucore.py”, line 390, in cryosparc_master.cryosparc_compute.gpu.gpucore.EngineBaseThread.ensure_allocated
File “/home/eugene/cryosparc/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py”, line 270, in empty
return device_array(shape, dtype, stream=stream)
File “/home/eugene/cryosparc/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py”, line 226, in device_array
arr = GPUArray(shape=shape, strides=strides, dtype=dtype, stream=stream)
File “/home/eugene/cryosparc/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py”, line 21, in init
super().init(shape, strides, dtype, stream, gpu_data)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devicearray.py”, line 103, in init
gpu_data = devices.get_context().memalloc(self.alloc_size)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1372, in memalloc
return self.memory_manager.memalloc(bytesize)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1056, in memalloc
ptr = self._attempt_allocation(allocator)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 863, in _attempt_allocation
return allocator()
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1054, in allocator
return driver.cuMemAlloc(size)
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 348, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File “/home/eugene/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 408, in _check_cuda_python_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

Was the job queued to

  1. submitted to an external workload manager (like slurm)?
  2. or submitted to the CryoSPARC-builtin cluster manager
  3. launched directly on GPU(s)?

Do non-CryoSPARC applications or jobs from another CryoSPARC instance also use the GPUs on this host?

Is any of these failed attempts an exact clone

  • same worker
  • same data
  • same parameters

of a successful CryoSPARC v4.3 job?

If not, please can you confirm that a job with the same worker, data and parameters does not fail after downgrading (preprequisites, downgrade instructions) your instance to v4.3.1

cryosparcm update --version=v4.3.1

Until the issue is resolved, you may want to preserve the failed jobs for comparison (that is, neither delete nor re-run them).

We are experiencing this error when running Patch Motion Correction on a multi-GPU job. With CryoSPARC Live (v4.5.3), everything functions correctly, even with multiple preprocessing workers. However, once the job is started in the workspace, all multi-GPU jobs result in a CUDA_ERROR_OUT_OF_MEMORY. Currently, I am running a single-GPU job, and this error has not occurred. Additionally, the low-memory option does not seem to have any effect. It would be helpful if this issue could be resolved. Before upgrading to CryoSPARC 4.5, everything worked as expected.

@dzyla What is the output of the command

nvidia-smi --query-gpu=index,name --format=csv

on the affected worker(s)?

The result is:

workstation 1:

index, name
0, NVIDIA GeForce RTX 3070
1, NVIDIA GeForce RTX 3070
2, NVIDIA GeForce RTX 3070
3, NVIDIA GeForce RTX 3070

workstation 2:

index, name
0, NVIDIA GeForce RTX 2080 Ti
1, NVIDIA GeForce RTX 2080 Ti
2, NVIDIA GeForce RTX 2080 Ti
3, NVIDIA GeForce RTX 2080 Ti

Both worked well previously, and we have never had issues with this error.

@dzyla We expect a modest increase in VRAM usage after an upgrade to CryoSPARC v4.4+. With GPUs with VRAM sizes below or barely at the (by now fairly dated) minimum recommendation of 11 GB, certain job types may may fail due to insufficient VRAM. We are considering an increase in the minimum VRAM recommendation for recent versions of CryoSPARC.

Was the change in version 4.4 and later so significant that the motion correction feature, which previously worked perfectly, is now showing errors? Is the live version still using the old algorithm? I have not encountered any issues with GPU memory in the live version. I would appreciate the addition of a legacy Patch Motion Correction to avoid the need for hardware upgrades.

Hi, I am having the same error, I read the thread and change some of the settings for the job, I turned on low-memory mode, turned on save results in 16 bit floating point, turned off output denser training data. Select output F-corp factor 1/2
Select Z = 5, Y =5, X = 7 overriding knots.
Still got the error.
CryoSparc version 4.6.2
Error msg:

[CPU:  282.8 MB  Avail: 242.05 GB]
Error occurred while processing J1/imported/006894924300175400923_25jun09a_00004hl_00003ex.tif Traceback (most recent call last): File "/scratch/users/cryosparc/cryosparc_worker/cryosparc_compute/jobs/pipeline.py", line 59, in exec return self.process(item) File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 213, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 216, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 245, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 222, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 292, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 710, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 207, in cryosparc_master.cryosparc_compute.gpu.gpucore.transfer_ndarray_to_cudaarray File "/scratch/users/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 232, in _require_cuda_context return fn(*args, **kws) File "/scratch/users/cryosparc/cryosparc_worker/cryosparc_compute/gpu/driver.py", line 169, in create_array handle = allocator() File "/scratch/users/cryosparc/cryosparc_worker/cryosparc_compute/gpu/driver.py", line 155, in <lambda> allocator = lambda: cuda_check_error(cuda.cuArrayCreate(desc), "Could not allocate GPU array") File "/scratch/users/cryosparc/cryosparc_worker/cryosparc_compute/gpu/driver.py", line 284, in cuda_check_error raise RuntimeError(f"{msg}: {err.name}") 
RuntimeError: Could not allocate GPU array: CUDA_ERROR_OUT_OF_MEMORY Marking J1/imported/006894924300175400923_25jun09a_00004hl_00003ex.tif as incomplete and continuing...
**Nividia-smi output:** 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:3B:00.0 Off |                  N/A |
| 31%   35C    P8     1W / 250W |      6MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:5E:00.0 Off |                  N/A |
| 30%   34C    P8     2W / 250W |      6MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:86:00.0 Off |                  N/A |
| 29%   30C    P8    12W / 250W |      6MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:AF:00.0  On |                  N/A |
| 31%   37C    P8    19W / 250W |    229MiB / 11264MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3150      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      3150      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      3150      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      3150      G   /usr/lib/xorg/Xorg                 68MiB |
|    3   N/A  N/A    122305      G   /usr/lib/firefox/firefox          158MiB |

Hi @Nishat1, how large are your micrographs?

Micrographs are in the range of 300-600 MB, like some are 300 MB, some are 600 MB, and there’s 28k raw movies.

total 11TB of dataset

@Nishat1, sorry I meant what’s the resolution in pixels, and how many frames? I should have been more specific, I apologize.

physical pixel size 0.825 A, raw data is in super-resolution. So, I was using 0.4125 A to import the raw frames, and set 1/2 for Output F-crop while running patch motion correction.
63 frames/exposure

@Nishat1 are these K3 super-resolution movies? 8184x11520? I tried a similar setup using K3 super-res movies, with low-memory mode and f-crop 1/2 on, and I never saw more than 8GiB of VRAM in use. Is it possible that another process was using your GPU at the same time?

1 Like

yes, these are K3 Super resolution and yes 8184x11520. I checked for other processes, nothing is running. However, I switched to 16GM VRAM another workstation, and it seems to be working for now.

1 Like