We have an issue with CS 4.4.1, which we did not have with previous versions. We are running NU-refinement with a box size of 648, which should be handled by the 2080Ti with 11GB VRAM. Here are the settings we have enabled: minimize over per-particle scale, optimize per-particle defocus, and optimize per-group CTF params.
On a server with 768GB RAM, 4x 2080Ti, we get the following error:
Traceback (most recent call last):
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 855, in _attempt_allocation
return allocator()
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 1058, in allocator
return driver.cuMemAlloc(size)
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 352, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 412, in _check_cuda_python_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 2192, in run_with_except_hook
run_old(*args, **kw)
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/threading.py”, line 870, in run
self._target(*self._args, **self._kwargs)
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 2702, in cryosparc_master.cryosparc_compute.engine.newengine.process.work
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 2868, in cryosparc_master.cryosparc_compute.engine.newengine.process.work
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 1148, in cryosparc_master.cryosparc_compute.engine.newengine.EngineThread.project_model
File “cryosparc_master/cryosparc_compute/gpu/gpucore.py”, line 390, in cryosparc_master.cryosparc_compute.gpu.gpucore.EngineBaseThread.ensure_allocated
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py”, line 270, in empty
return device_array(shape, dtype, stream=stream)
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py”, line 226, in device_array
arr = GPUArray(shape=shape, strides=strides, dtype=dtype, stream=stream)
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py”, line 21, in init
super().init(shape, strides, dtype, stream, gpu_data)
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/devicearray.py”, line 103, in init
gpu_data = devices.get_context().memalloc(self.alloc_size)
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 1376, in memalloc
return self.memory_manager.memalloc(bytesize)
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 1060, in memalloc
ptr = self._attempt_allocation(allocator)
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 867, in _attempt_allocation
return allocator()
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 1058, in allocator
return driver.cuMemAlloc(size)
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 352, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 412, in _check_cuda_python_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
Trying to run the same job with the low-memory mode enabled and it crashes with the same error.
Now, I can run it on a different machine with the quadro M6000 24GB, but only with the low-memory mode enabled.
Curiously, we tried to run it on our most powerful server, which has 1TB RAM and 8x A100 80GB GPUs. Here too, it will only run with the low-memory enabled.
What has changed from previous versions of CS? We could run such a job with our 2080Ti GPU with the low-memory mode disabled.
Any advice will be greatly appreciated. Thank you.