Illegal memory access in multirefine

My multirefine jobs are failing quickly, but after several iterations, with a CUDA memory access error like the one below. Has anyone else seen this, or been able to work around it?

Traceback (most recent call last):
  File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/sparc/", line 321, in run_with_except_hook
    run_old(*args, **kw)
  File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/engine/", line 86, in run*self.args,, thidx=self.thidx)
  File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/engine/", line 626, in work
    ET.compute_resid_pow() # do this even if not do_align because we have to compare the different structures
  File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/engine/", line 264, in compute_resid_pow
  File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/engine/", line 42, in toc
  File "/mnt/cache/cryosparc/cryosparc/cryosparc-compute/engine/", line 38, in wait
LogicError: cuStreamSynchronize failed: an illegal memory access was encountered

I believe this bug is triggered when multirefine is launched with structures and/or particles from multiple experiments. If structures and particles only from one experiment are selected, my job completes. If a structure or particles are also taken from another experiment (same dataset), the job fails.

Just as another data point, I haven’t taken particles from multiple datasets, but I’ve used structures from multiple datasets in multirefine without seeing this


@spunjani Let me know what other information I could give. I’ve had trouble replicating the error with new jobs, but clones of jobs that had the error always fail the same way.