My multirefine jobs are failing quickly, but after several iterations, with a CUDA memory access error like the one below. Has anyone else seen this, or been able to work around it?
Traceback (most recent call last):
File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/sparc/streamlog.py", line 321, in run_with_except_hook
run_old(*args, **kw)
File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/engine/cuda_core.py", line 86, in run
self.target(*self.args, dev=self.dev, thidx=self.thidx)
File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/engine/engine.py", line 626, in work
ET.compute_resid_pow() # do this even if not do_align because we have to compare the different structures
File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/engine/engine.py", line 264, in compute_resid_pow
self.toc('compute_resid_pow')
File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/engine/engine.py", line 42, in toc
self.wait()
File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/engine/engine.py", line 38, in wait
self.stream.synchronize()
LogicError: cuStreamSynchronize failed: an illegal memory access was encountered