LogicError: cuStreamSynchronize failed: an illegal memory access was encountered

cryoem006 · December 26, 2018, 10:45pm

Hello,

recently I encounter a problem during hetero refinement (error message at end). This problem seems not to be related purely to available memory (doesn’t happen with certain large datasets, using 5 input volumes, but happens with others, using only two volumes).
Tried several different machines.
Using five input volumes, for instance, throws the error twice right after "Number of BnB iterations 1, DEV 2 THR 0 NUM 500 TOTAL 10.455205 ELAPSED 11.232560 – "; when using any combination of only two input volumes allows to progress further, just to give the error after “Iteration 20 […] Testing for assignment convergence … DEV 2 THR 0 NUM 500 TOTAL 11.991543 ELAPSED 12.086112 --”

Traceback (most recent call last):
File “cryosparc2_compute/jobs/runcommon.py”, line 738, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 92, in cryosparc2_compute.engine.cuda_core.GPUThread.run
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 93, in cryosparc2_compute.engine.cuda_core.GPUThread.run
File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 1049, in cryosparc2_compute.engine.engine.process.work
File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 329, in cryosparc2_compute.engine.engine.EngineThread.compute_resid_pow
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 228, in cryosparc2_compute.engine.cuda_core.EngineBaseThread.toc
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 224, in cryosparc2_compute.engine.cuda_core.EngineBaseThread.wait
LogicError: cuStreamSynchronize failed: an illegal memory access was encountered

Any ideas? The forum records one instance of this problem, with no solution though.

hansenbry · March 4, 2019, 4:48pm

Hi does anyone have a solution for this one? I’m having the same behavior on my machines. I’m curious if there is a “bad” box size for this calculation that we should be staying away from.

hansenbry · March 4, 2019, 9:08pm

I thought I would post an update on this. I increased my boxsize from 100 to 256 and hetero refinement started working