I have been having some errors which all end in “cuMemHostAlloc failed: out of memory” in both Ab-Initio and Homogeneous refinement jobs.
I don’t think I am running out of GPU memory. I am running a 1080Ti GPU with CUDA 8. My dataset is in a 512 pixel box. At the point of crashing the memory usage of the GPU seems to be peaking at ~8GB, far below the 11GB capacity. I get these errors also when using the same dataset with 2x binned particles. I am running v0.6.3.
These errors are a little sporadic and I can get jobs to run to completion sometimes by simply cloning and restarting.
The error for Ab initio is:
Traceback (most recent call last):
File "/raid/cryosparc/cryosparc-compute/sparc/streamlog.py", line 438, in run_with_except_hook
run_old(*args, **kw)
File "/raid/cryosparc/cryosparc-compute/engine/cuda_core.py", line 98, in run
self.target(*self.args, dev=self.dev, thidx=self.thidx)
File "/raid/cryosparc/cryosparc-compute/engine/engine.py", line 860, in work
rs, ts = ET.cull_candidates(r_factor, t_factor)
File "/raid/cryosparc/cryosparc-compute/engine/engine.py", line 433, in cull_candidates
self.ensure_allocated('error_r', (self.N_D, self.N_K, self.N_RS_aligned), n.float32)
File "/raid/cryosparc/cryosparc-compute/engine/engine.py", line 46, in ensure_allocated
new = cuda_core.allocate_cpu(shape, dtype, curr)
File "/raid/cryosparc/cryosparc-compute/engine/cuda_core.py", line 106, in allocate_cpu
ret = cudrv.pagelocked_empty(shape, dtype=dtype)
MemoryError: cuMemHostAlloc failed: out of memory
and for Homogeneous refinement:
Traceback (most recent call last):
File "/raid/cryosparc/cryosparc-compute/sparc/streamlog.py", line 438, in run_with_except_hook
run_old(*args, **kw)
File "/raid/cryosparc/cryosparc-compute/engine/cuda_core.py", line 98, in run
self.target(*self.args, dev=self.dev, thidx=self.thidx)
File "/raid/cryosparc/cryosparc-compute/engine/engine.py", line 855, in work
if do_align: ET.setup_current_poses_and_shifts(rs, dr, ts, dt, shared=(bnb_iter == 0)) # rs and ts are shared in first iter
File "/raid/cryosparc/cryosparc-compute/engine/engine.py", line 234, in setup_current_poses_and_shifts
self.ensure_allocated('Rs', (self.N_D, self.N_K, self.N_RS_aligned, 6), n.float32)
File "/raid/cryosparc/cryosparc-compute/engine/engine.py", line 46, in ensure_allocated
new = cuda_core.allocate_cpu(shape, dtype, curr)
File "/raid/cryosparc/cryosparc-compute/engine/cuda_core.py", line 106, in allocate_cpu
ret = cudrv.pagelocked_empty(shape, dtype=dtype)
MemoryError: cuMemHostAlloc failed: out of memory
Do you have any suggestions to get these jobs to complete?
Thanks,
Donald.