NU refinement cufftAllocFailed error

Omid · June 9, 2019, 1:59am

Dear cryosparc community,

I keep experiencing the error below during NU refinement. Would really appreciate some advice.
4 GPU RTX2080 workstation. Using Cuda 10 and Cryosparc 2.8.0. The error occured during dynamic masking.

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 830, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 109, in cryosparc2_compute.engine.cuda_core.GPUThread.run (/home/installtest/deps_manage/cryosparc2_package/deploy/stage/cryosparc2_worker/cryosparc2_compute/engine/cuda_core.c:4599)
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 110, in cryosparc2_compute.engine.cuda_core.GPUThread.run (/home/installtest/deps_manage/cryosparc2_package/deploy/stage/cryosparc2_worker/cryosparc2_compute/engine/cuda_core.c:4550)
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 990, in cryosparc2_compute.engine.engine.process.work (/home/installtest/deps_manage/cryosparc2_package/deploy/stage/cryosparc2_worker/cryosparc2_compute/engine/engine.c:27207)
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 108, in cryosparc2_compute.engine.engine.EngineThread.load_image_data_gpu (/home/installtest/deps_manage/cryosparc2_package/deploy/stage/cryosparc2_worker/cryosparc2_compute/engine/engine.c:5658)
  File "cryosparc2_worker/cryosparc2_compute/engine/gfourier.py", line 33, in cryosparc2_compute.engine.gfourier.fft2_on_gpu_inplace (/home/installtest/deps_manage/cryosparc2_package/deploy/stage/cryosparc2_worker/cryosparc2_compute/engine/gfourier.c:1866)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py", line 126, in __init__
    onembed, ostride, odist, self.fft_type, self.batch)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 741, in cufftMakePlanMany
    cufftCheckStatus(status)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 116, in cufftCheckStatus
    raise e
cufftAllocFailed

Cheers,
Omid

stephan · June 10, 2019, 2:57pm

Hi @Omid,

Thanks for reporting this. Is it possible if you can let us know what parameters you used for this job?

donghuachen · February 9, 2020, 2:39am

Hi, I got the same error during NU refinement:

[CPU: 55.05 GB] Traceback (most recent call last):

File "cryosparc2_compute/jobs/runcommon.py", line 1547, in run_with_except_hook

run_old(*args, **kw)

File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 110, in cryosparc2_compute.engine.cuda_core.GPUThread.run

File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 111, in cryosparc2_compute.engine.cuda_core.GPUThread.run

File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 991, in cryosparc2_compute.engine.engine.process.work

File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 109, in cryosparc2_compute.engine.engine.EngineThread.load_image_data_gpu

File "cryosparc2_worker/cryosparc2_compute/engine/gfourier.py", line 33, in cryosparc2_compute.engine.gfourier.fft2_on_gpu_inplace

File "/data/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py", line 127, in __init__

onembed, ostride, odist, self.fft_type, self.batch)

File "/data/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 742, in cufftMakePlanMany

cufftCheckStatus(status)

File "/data/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 117, in cufftCheckStatus

raise e

cufftAllocFailed

I was using all default parameters. Running CryoSPARC v2.13.2 on CentOS Linux release 7.6.1810 with CUDA Version 10.1.105.
Thanks.

stephan · February 10, 2020, 3:51pm

Hi @donghuachen,

What is the size of your particles (box size)?
Are you also running this job on a RTX 2080?

donghuachen · February 10, 2020, 4:05pm

My particle box size is 480.
Yes. My GPU is RTX 2080 Ti.

BTW, when I re-run the same job (clone job) with only change “Refinement box size (Voxels)” to 480, it worked successfully. Originally it was default NONE and the job failed.

Thanks.

donghuachen · February 10, 2020, 9:25pm

Hi, I was doing a Local Refinement with NU Refinement option turned on, and failed twice with the exactly same error. However, the first failure was during Iteration 1 and the second failure was during Iteration 5. Not sure if it is related to NU Refinement option turned on. Now I am trying the same job but with NU Refinement option turned off.

donghuachen · February 11, 2020, 1:48am

Just update. My Local Refinement with NU Refinement option turned off run successfully. However, the resolution was not improved. So the error is related to NU Refinement.

crescalante · February 13, 2020, 2:26pm

I have the same issue. Running local refinement using cryosparc v2.13.2 with CUDA 10.2. (GPU are GTX 1080 Ti). I had run a similar job before upgrading cryosparc, cuda and Nvidia drivers. The Job runs fine without the Non-uniform refinement option. Has this issue been solved?

stephan · February 13, 2020, 5:51pm

Hi @donghuachen, @crescalante,

Take a look at this post:

Could these issues be related?

donghuachen · February 14, 2020, 3:18pm

Hi All,
When you do NU Refinement or Local Refinement with NU Refinement option turned on using cryoSPARC v2.13.2, please try to change “Computational minibatch size” from 2000 to 1000. It worked for me.
Hope this helps.