Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE

Hi,

I encountered an error that was also previously reported:

Traceback (most recent call last):
  File "/home/em5/Software/cryosprac/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2306, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 136, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run
  File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 137, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run
  File "cryosparc_master/cryosparc_compute/jobs/class2D/newrun.py", line 642, in cryosparc_master.cryosparc_compute.jobs.class2D.newrun.class2D_engine_run.work
  File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 1399, in cryosparc_master.cryosparc_compute.engine.newengine.EngineThread.compute_resid_pow
  File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 382, in cryosparc_master.cryosparc_compute.gpu.gpucore.EngineBaseThread.ensure_allocated
  File "/home/em5/Software/cryosprac/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 232, in _require_cuda_context
    return fn(*args, **kws)
  File "/home/em5/Software/cryosprac/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/api.py", line 189, in pinned_array
    buffer = current_context().memhostalloc(bytesize)
  File "/home/em5/Software/cryosprac/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1378, in memhostalloc
    return self.memory_manager.memhostalloc(bytesize, mapped, portable, wc)
  File "/home/em5/Software/cryosprac/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 889, in memhostalloc
    pointer = allocator()
  File "/home/em5/Software/cryosprac/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 884, in allocator
    return driver.cuMemHostAlloc(size, flags)
  File "/home/em5/Software/cryosprac/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 348, in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
  File "/home/em5/Software/cryosprac/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 408, in _check_cuda_python_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_INVALID_VALUE] Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE

This error occurs specifically on one workstation (2x RTX4000Ada 20GB VRAM, 256GB RAM). Interestingly, other workstations with the exact same configuration or less VRAM and RAM, have no issues running the same job. There also appears to be some randomness: not all jobs will fail, but only some of them, without any obvious correlation between box size/particle number/batch size, but then it is reproducible for this exact job. Further, this issue is new and only happened after we had to change all workstations to v4.7.1-cuda+250814, and I had to update CUDA from 550xx to 580.95.05. On the Cuda11 version of cryoSPARC, everything was working.

As suggested in a previous thread, I added

export CRYOSPARC_NO_PAGELOCK=true

to the worker config, which worked. I am curious why these issues suddenly popped up.

Thanks @OleUns .

We are unsure about the cause, but glad you let users and us know that the fix was effective.

1 Like