CUDA_ERROR_OUT_OF_MEMORY error for 2D classification after upgrading to v5

Hi All,

I got CUDA_ERROR_OUT_OF_MEMORY error for 2D classification after upgrading to v5 (currently on v5.0.2). This error always happens when using a class number of 400, either with 1 or 2 GPUs (GeForce RTX 4090 with 24 GB of VRAM). The 2D jobs work fine with a class number of 300 or 250. I used to be able to run two instances of 2D classification with 400 classes on one such GPU on v4.7.1, but now I have to run the same job on GPUs with 48 GB of VRAM (RTX A6000). Below is the error message I received:

Traceback (most recent call last):
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/driver.py", line 851, in _attempt_allocation
    return allocator()
           ^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/driver.py", line 1054, in allocator
    return driver.cuMemAlloc(size)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/driver.py", line 348, in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/driver.py", line 408, in _check_cuda_python_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cli/run.py", line 105, in cli.run.run_job
  File "cli/run.py", line 210, in cli.run.run_job_function
  File "compute/jobs/class2D/run.py", line 295, in compute.jobs.class2D.run.run_class_2D
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/compute/alignment.py", line 652, in greedy_align_2D_noqueue
    align_res = align_pairs(
                ^^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/compute/alignment.py", line 440, in align_pairs
    NET.ensure_allocated("denom", (N_H, N_KK, N_S, N_R), n.float32)
  File "compute/gpu/gpucore.py", line 399, in compute.gpu.gpucore.EngineBaseThread.ensure_allocated
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/compute/gpu/gpuarray.py", line 377, in empty
    return device_array(shape, dtype, stream=stream)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/compute/gpu/gpuarray.py", line 333, in device_array
    arr = GPUArray(shape=shape, strides=strides, dtype=dtype, stream=stream)  # type: ignore
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/compute/gpu/gpuarray.py", line 122, in __init__
    super().__init__(shape, strides, dtype, stream, gpu_data)
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/devicearray.py", line 103, in __init__
    gpu_data = devices.get_context().memalloc(self.alloc_size)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/driver.py", line 1372, in memalloc
    return self.memory_manager.memalloc(bytesize)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/driver.py", line 1056, in memalloc
    ptr = self._attempt_allocation(allocator)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/driver.py", line 863, in _attempt_allocation
    return allocator()
           ^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/driver.py", line 1054, in allocator
    return driver.cuMemAlloc(size)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/driver.py", line 348, in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cryosparcuser/Applications/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/driver.py", line 408, in _check_cuda_python_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

I have been monitoring the VRAM usage and noticed that the VRAM usage is around 7-8GB during initial iterations and increases to ~15GB as soon as the 2D classification job starts the first full iteration, which is immediately followed by the CUDA_ERROR_OUT_OF_MEMORY error. I did not see the full 24GB VRAM being filled before the job crash.

Now I have downgraded to v4.7.1 due to this issue, but would really hope this issue can be solved soon for v5.

Thanks.

Thanks @YYang for your post. In v5.0.2, one may be able to avoid Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY in 2D classification with hundreds of classes by setting the Plotting sort method parameter to size.

Thanks @wtempel for the suggestion. I will try it and post later whether it solves the issue.

Best,

Yang