Get error: pycuda._driver.MemoryError: cuMemAlloc failed: out of memory

windows11 · December 20, 2021, 4:13am

Hi,
I recently got a new gpu computer with 4 xRTX3090 GPU VRAM 24Gb, 64 core x128 threads CPU, with 256Gb RAM. Current cuda version is 11.4. I keep getting the same error when I running 2D classification and heterorefinement.

2D classification:
CPU : [0, 1]
GPU : [0, 1]
RAM : [0, 1, 2]
SSD : True

error at 20 iteration:

[CPU: 13.84 GB]  Traceback (most recent call last):
  File "/home/exx/cryoSPARC/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 1811, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 1090, in cryosparc_compute.engine.engine.process.work
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 306, in cryosparc_compute.engine.engine.EngineThread.compute_resid_pow
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 353, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
  File "/home/exx/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 210, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory

The error showed up when I used 4 GPUs or 2 GPUs, but it finished successfully when used 1 GPU.

Heterorefienment:
CPU : [0, 1, 2, 3]
GPU : [0]
RAM : [0, 1]
SSD : True

Error:

[CPU: 4.51 GB]   Traceback (most recent call last):
  File "/home/exx/cryoSPARC/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 1811, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 1090, in cryosparc_compute.engine.engine.process.work
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 306, in cryosparc_compute.engine.engine.EngineThread.compute_resid_pow
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 353, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
  File "/home/exx/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 210, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
**pycuda._driver.MemoryError: cuMemAlloc failed: out of memory**

I noticed the CPU has 64 core x128 threads, and cryosparc recognized the computer as 128 core. Is it related with the error?

Thanks.

wtempel · December 20, 2021, 4:12pm

Welcome to the forum @windows11.
Please retry the job with cuda-11.2:

install cuda-11.2 if needed
reconfigure the worker using cryosparcw newcuda <path-to-cuda-11.2>
rerun the job

Please let us know if the problem persists.

windows11 · December 21, 2021, 3:14am

Thanks. It’s my fault that the current Cuda version is actually 11.2, I restart the computer and the problem was solved.