cufftAllocFailed during particle extraction (v4.1.1)

YYang · December 26, 2022, 6:17pm

Hi,

I am getting the same error during particle extraction (with slightly different error message):

Error occurred while processing micrograph J1/imported/015139662710773897517_FoilHole_7700328_Data_7666634_7666636_20221208_055455_EER.mrc
Traceback (most recent call last):
  File "/home/changliu/Applications/cryosparc/cryosparc_worker/cryosparc_compute/jobs/pipeline.py", line 60, in exec
    return self.process(item)
  File "/home/changliu/Applications/cryosparc/cryosparc_worker/cryosparc_compute/jobs/extract/run.py", line 498, in process
    result = extraction_gpu.do_extract_particles_single_mic_gpu(mic=mic, bg_bin=bg_bin,
  File "/home/changliu/Applications/cryosparc/cryosparc_worker/cryosparc_compute/jobs/extract/extraction_gpu.py", line 143, in do_extract_particles_single_mic_gpu
    ifft_plan = skcuda_fft.Plan(shape = (bin_size, bin_size),
  File "/home/changliu/Applications/cryosparc/cryosparc_worker/cryosparc_compute/skcuda_internal/fft.py", line 132, in __init__
    self.worksize = cufft.cufftMakePlanMany(
  File "/home/changliu/Applications/cryosparc/cryosparc_worker/cryosparc_compute/skcuda_internal/cufft.py", line 749, in cufftMakePlanMany
    cufftCheckStatus(status)
  File "/home/changliu/Applications/cryosparc/cryosparc_worker/cryosparc_compute/skcuda_internal/cufft.py", line 124, in cufftCheckStatus
    raise e
cryosparc_compute.skcuda_internal.cufft.cufftInternalError

Marking J1/imported/015139662710773897517_FoilHole_7700328_Data_7666634_7666636_20221208_055455_EER.mrc as incomplete and continuing...

As I wrote in another post, this error seems to be related to GPU out of memory, because I noticed that the used GPU memory was about 5 GB when the job was initially launched and gradually increased as the job ran. Eventually, as one or more of the GPUs ran out of memory, I started to see the error messages. Using the CPU version of the job, particle extraction successfully completed for all the micrographs without any issue. And the system memory usage stayed about the same through the whole process of the job. It seems like the GPU version of the particle extraction job was unable to release the GPU memory after it finished extraction from previous batches of micrographs.

Thanks.