Hi,
I recently got a new gpu computer with 4 xRTX3090 GPU VRAM 24Gb, 64 core x128 threads CPU, with 256Gb RAM. Current cuda version is 11.4. I keep getting the same error when I running 2D classification and heterorefinement.
2D classification:
CPU : [0, 1]
GPU : [0, 1]
RAM : [0, 1, 2]
SSD : True
error at 20 iteration:
[CPU: 13.84 GB] Traceback (most recent call last):
File "/home/exx/cryoSPARC/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 1811, in run_with_except_hook
run_old(*args, **kw)
File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 1090, in cryosparc_compute.engine.engine.process.work
File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 306, in cryosparc_compute.engine.engine.EngineThread.compute_resid_pow
File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 353, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
File "/home/exx/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 210, in __init__
self.gpudata = self.allocator(self.size * self.dtype.itemsize)
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
The error showed up when I used 4 GPUs or 2 GPUs, but it finished successfully when used 1 GPU.
Heterorefienment:
CPU : [0, 1, 2, 3]
GPU : [0]
RAM : [0, 1]
SSD : True
Error:
[CPU: 4.51 GB] Traceback (most recent call last):
File "/home/exx/cryoSPARC/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 1811, in run_with_except_hook
run_old(*args, **kw)
File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 1090, in cryosparc_compute.engine.engine.process.work
File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 306, in cryosparc_compute.engine.engine.EngineThread.compute_resid_pow
File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 353, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
File "/home/exx/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 210, in __init__
self.gpudata = self.allocator(self.size * self.dtype.itemsize)
**pycuda._driver.MemoryError: cuMemAlloc failed: out of memory**
I noticed the CPU has 64 core x128 threads, and cryosparc recognized the computer as 128 core. Is it related with the error?
Thanks.