2D classification error: cryosparc_compute.skcuda_internal.cufft.cufftAllocFailed

yurotakagi · February 20, 2022, 1:49am

Hi
I have encountered the following error in 2D classification. The problem is something to do with Centos - the problem started happening after I installed update for Centos 7. Apparently, the same problem was discussed almost a year ago. It was indicated that the bug causing this problem has been fixed but I have the most updated version 3.3.1 and CUDA is 11.6. Any idea how to fix this? Thanks for your help
Best Yuro[CPU: 3.41 GB] Traceback (most recent call last):
File “/home/software/cryoem/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/runcommon.py”, line 1837, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 1028, in cryosparc_compute.engine.engine.process.work
File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 107, in cryosparc_compute.engine.engine.EngineThread.load_image_data_gpu
File “cryosparc_worker/cryosparc_compute/engine/gfourier.py”, line 32, in cryosparc_compute.engine.gfourier.fft2_on_gpu_inplace
File “/home/software/cryoem/cryosparc/cryosparc2_worker/cryosparc_compute/skcuda_internal/fft.py”, line 134, in init
onembed, ostride, odist, self.fft_type, self.batch)
File “/home/software/cryoem/cryosparc/cryosparc2_worker/cryosparc_compute/skcuda_internal/cufft.py”, line 749, in cufftMakePlanMany
cufftCheckStatus(status)
File “/home/software/cryoem/cryosparc/cryosparc2_worker/cryosparc_compute/skcuda_internal/cufft.py”, line 124, in cufftCheckStatus
raise e
cryosparc_compute.skcuda_internal.cufft.cufftAllocFailed

wtempel · February 22, 2022, 4:19pm

@yurotakagi Does the same job still fail if you configure your worker(s) with CUDA-11.2?
The toolkit can be installed independently from the Linux kernel driver as a non-root user, as explained in another forum (for a different CUDA version), subject to a minimum driver version.
Following toolkit installation, please run cryosparcw newcuda <cuda-path>.

yurotakagi · May 26, 2022, 1:13pm

Hi Wtempel

I did a clean re-installation of cryosparc with CUDA11.2. However, the same problem:“cryosparc_compute.skcuda_internal.cufft.cufftAllocFailed” for GPU required job s persists. I read your comment on the same issue in other discussion indicating that it could be OS issue? We are using Centos 7.9. Do you think we should switch to Ubunto to solve this problem?

Thanks for your help
Best

Yuro

wtempel · May 26, 2022, 2:54pm

While there are anecdotes of problems with centOS-7.9, I am not sure that this is not merely a problem related to the GPU memory demands of the job or the CUDA toolkit/driver installation, given the history of the issue.

Please can you provide additional information:

run these three commands, compare output of final two:

nvidia-smi output
particle box size
maximum resolution and batch size for 2D classification

yurotakagi · May 26, 2022, 3:14pm

I decided to switch OS from Centos 7.9 to Ubuntu to see how things go