LogicError: cuMemHostAlloc failed: invalid argument during Heterogeneous Refinement

donghuachen · February 20, 2020, 6:45am

Hi All, I got the following error after iteration 0 during Heterogeneous Refinement. Using CryoSPARC v2.13.2 and cuda/9.2.148 on CentOS Linux release 7.6.1810. Anyone can help? Thanks.

[CPU: 10.83 GB] Done in 21.410s.
[CPU: 10.83 GB] Outputting files…
[CPU: 10.79 GB] Done in 7.830s.
[CPU: 10.79 GB] Done iteration 0 in 203.650s. Total time so far 203.650s
[CPU: 10.79 GB] – Iteration 1
[CPU: 10.79 GB] Batch size 6000
[CPU: 10.79 GB] Using Alignment Radius 23.670 (27.363A)
[CPU: 10.79 GB] Using Reconstruction Radius 35.504 (18.242A)
[CPU: 10.79 GB] Randomizing assignments for identical classes…
[CPU: 10.79 GB] Number of BnB iterations 3
[CPU: 10.79 GB] Engine Started.
[CPU: 13.97 GB] Traceback (most recent call last):
File “cryosparc2_compute/jobs/runcommon.py”, line 1547, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 110, in cryosparc2_compute.engine.cuda_core.GPUThread.run
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 111, in cryosparc2_compute.engine.cuda_core.GPUThread.run
File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 1053, in cryosparc2_compute.engine.engine.process.work
File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 308, in cryosparc2_compute.engine.engine.EngineThread.compute_resid_pow
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 293, in cryosparc2_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
LogicError: cuMemHostAlloc failed: invalid argument

apunjani · February 26, 2020, 7:57pm

Hi @donghuachen,

What was the box size, pixel size and number of images in the dataset? What GPU model was this running on? The error message generically indicates an out-of-memory condition.

donghuachen · February 27, 2020, 6:02am

The box size is 256, the pixel size is 2.53, the number of particles is 126,238. The GPU is Titan V.
Thanks.

apunjani · March 4, 2020, 7:06pm

Hi @donghuachen,

This is strange because the box size and number of partilces is not very large. How much CPU RAM do you have on this node? Is this still an issue or did you find a way to get things working?

donghuachen · March 6, 2020, 6:52am

There is 256GB RAM. I have not found a solution for this problem.

apunjani · March 11, 2020, 6:48pm

@donghuachen sorry we haven’t been able to help much with this so far. One question: when you created the heterogeneous refinement job, did you connect multiple of the same input reference volume, or multiple different reference volumes (or a combination)?

donghuachen · March 16, 2020, 10:13pm

I have 6 volumes which are from the same input reference.

apayne97 · September 28, 2020, 9:17pm

Is there any update on this?

I have been looking at these other threads:

And can’t quite piece together what the problem is…we are getting this error when running on our RTX 2080s on a local workstation. It happens when I use 1 or 4 GPUs, on several different job types, and sometimes if I just re-run the job it works without a problem.

[CPU: 966.1 MB] Traceback (most recent call last): File “cryosparc2_compute/jobs/runcommon.py”, line 1685, in run_with_except_hook run_old(*args, **kw) File “/home/hiter/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py”, line 754, in run self.__target(*self.__args, **self.__kwargs) File “cryosparc2_compute/jobs/pipeline.py”, line 165, in thread_work work = processor.process(item) File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 157, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 160, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 161, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py”, line 77, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/patchmotion.py”, line 387, in cryosparc2_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 312, in cryosparc2_compute.engine.cuda_core.EngineBaseThread.ensure_allocated File “/home/hiter/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/gpuarray.py”, line 210, in init self.gpudata = self.allocator(self.size * self.dtype.itemsize) MemoryError: cuMemAlloc failed: out of memory