pycuda._driver.Error: cuMemGetInfo failed: unknown error from NU Refinement of v3.1.0

donghuachen · February 13, 2021, 7:55am

Hi All,

I got the following error at the bottom during NU Refinement of v3.1.0. After this error, the nvidia-smi command can’t show any GPU and had this message:
Unable to determine the device handle for GPU 0000:67:00.0: GPU is lost. Reboot the system to recover this GPU

I have GeForce RTX 2080 Ti and CUDA release 10.1, V10.1.105.
Could the Cuda version 10.1 cause the problem? Please help. Thanks.

File “cryosparc_worker/cryosparc_compute/run.py”, line 84, in cryosparc_compute.run.main
File “cryosparc_worker/cryosparc_compute/jobs/nonuniform_refine/run.py”, line 434, in cryosparc_compute.jobs.nonuniform_refine.run.run_non_uni_refine
File “cryosparc_worker/cryosparc_compute/engine/newengine.py”, line 2069, in cryosparc_compute.engine.newengine.process
File “cryosparc_worker/cryosparc_compute/engine/newengine.py”, line 1886, in cryosparc_compute.engine.newengine.get_current_GPU_memory
File “cryosparc_worker/cryosparc_compute/engine/newengine.py”, line 1887, in cryosparc_compute.engine.newengine.get_current_GPU_memory
pycuda._driver.Error: cuMemGetInfo failed: unknown error

donghuachen · February 14, 2021, 7:16pm

Hi All,

I re-imported the same particles from a different JoinStar file of Relion and imported a different cryosparc map as initial model. Failed again with the new error:

[CPU: 26.97 GB] Traceback (most recent call last):
File “/data/donghua/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/runcommon.py”, line 1726, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 130, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 1027, in cryosparc_compute.engine.engine.process.work
File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 106, in cryosparc_compute.engine.engine.EngineThread.load_image_data_gpu
File “cryosparc_worker/cryosparc_compute/engine/gfourier.py”, line 33, in cryosparc_compute.engine.gfourier.fft2_on_gpu_inplace
File “/data/donghua/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/skcuda/fft.py”, line 127, in init
onembed, ostride, odist, self.fft_type, self.batch)
File “/data/donghua/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/skcuda/cufft.py”, line 742, in cufftMakePlanMany
cufftCheckStatus(status)
File “/data/donghua/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/skcuda/cufft.py”, line 117, in cufftCheckStatus
raise e
skcuda.cufft.cufftInternalError

donghuachen · February 15, 2021, 4:45pm

Hi All,

Good news! I have updated my CUDA to V10.2.89 and CryoSPARC v3.1.0 seemed to be working fine after the CUDA update. Thanks.