I have encountered the following error in 2D classification. The problem is something to do with Centos - the problem started happening after I installed update for Centos 7. Apparently, the same problem was discussed almost a year ago. It was indicated that the bug causing this problem has been fixed but I have the most updated version 3.3.1 and CUDA is 11.6. Any idea how to fix this? Thanks for your help
Best Yuro[CPU: 3.41 GB] Traceback (most recent call last):
File “/home/software/cryoem/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/runcommon.py”, line 1837, in run_with_except_hook
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 1028, in cryosparc_compute.engine.engine.process.work
File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 107, in cryosparc_compute.engine.engine.EngineThread.load_image_data_gpu
File “cryosparc_worker/cryosparc_compute/engine/gfourier.py”, line 32, in cryosparc_compute.engine.gfourier.fft2_on_gpu_inplace
File “/home/software/cryoem/cryosparc/cryosparc2_worker/cryosparc_compute/skcuda_internal/fft.py”, line 134, in init
onembed, ostride, odist, self.fft_type, self.batch)
File “/home/software/cryoem/cryosparc/cryosparc2_worker/cryosparc_compute/skcuda_internal/cufft.py”, line 749, in cufftMakePlanMany
File “/home/software/cryoem/cryosparc/cryosparc2_worker/cryosparc_compute/skcuda_internal/cufft.py”, line 124, in cufftCheckStatus
@yurotakagi Does the same job still fail if you configure your worker(s) with CUDA-11.2?
The toolkit can be installed independently from the Linux kernel driver as a non-root user, as explained in another forum (for a different CUDA version), subject to a minimum driver version.
Following toolkit installation, please run
cryosparcw newcuda <cuda-path>.
I did a clean re-installation of cryosparc with CUDA11.2. However, the same problem:“cryosparc_compute.skcuda_internal.cufft.cufftAllocFailed” for GPU required job s persists. I read your comment on the same issue in other discussion indicating that it could be OS issue? We are using Centos 7.9. Do you think we should switch to Ubunto to solve this problem?
Thanks for your help
While there are anecdotes of problems with centOS-7.9, I am not sure that this is not merely a problem related to the GPU memory demands of the job or the CUDA toolkit/driver installation, given the history of the issue.
Please can you provide additional information:
- run these three commands, compare output of final two:
- particle box size
- maximum resolution and batch size for 2D classification
I decided to switch OS from Centos 7.9 to Ubuntu to see how things go