CryoSPARC crashes with cufftExceptions

bcanax · May 8, 2017, 11:13am

I have successfully run CryoSPARC on a few different data sets. Last few days cryoSPARC has been crashing with the following error during refinement (Ab-initio is fine):

Engine Started.
Traceback (most recent call last):
File “/data1/progs/cryosparc/cryosparc-compute/sparc/streamlog.py”, line 321, in run_with_except_hook
run_old(*args, **kw)
File “/data1/progs/cryosparc/cryosparc-compute/engine/cuda_core.py”, line 86, in run
self.target(self.args, dev=self.dev, thidx=self.thidx)
File “/data1/progs/cryosparc/cryosparc-compute/engine/engine.py”, line 619, in work
ET.load_image_data_gpu(batch)
File “/data1/progs/cryosparc/cryosparc-compute/engine/engine.py”, line 113, in load_image_data_gpu
gfourier.fft2_on_gpu_inplace(self.data_full_gpu, stream=self.stream)
File “/data1/progs/cryosparc/cryosparc-compute/engine/gfourier.py”, line 45, in fft2_on_gpu_inplace
ostride=1, odist=NN)
File “/data1/progs/cryosparc/anaconda2/lib/python2.7/site-packages/skcuda/fft.py”, line 115, in init
onembed, ostride, odist, self.fft_type, self.batch)
File “/data1/progs/cryosparc/anaconda2/lib/python2.7/site-packages/skcuda/cufft.py”, line 222, in cufftPlanMany
cufftCheckStatus(status)
File “/data1/progs/cryosparc/anaconda2/lib/python2.7/site-packages/skcuda/cufft.py”, line 110, in cufftCheckStatus
raise cufftExceptions[status]
cufftAllocFailed

apunjani · May 8, 2017, 4:51pm

Hi @bcanax,

Can you give some info about box-size? And the version number of cryosparc you’re running?
Any chance another process is using the GPUs? Did you only start getting the errors after updating? Nothing has really changed in the FFT parts since 0.3.9.

Thanks,
Ali

bcanax · May 8, 2017, 5:34pm

HI Ali,

Updated from version 0.3.9. Running the current version, 0.4.1. The box size is 386. I have only about 8000 particle.
Tested a refinement that I ran using the older version and it completes without any errors.

Thanks

apunjani · May 8, 2017, 6:01pm

GPU model and CUDA version?

bcanax · May 8, 2017, 6:28pm

GeForce GTX 1080 and cuda version 8.0

apunjani · May 15, 2017, 8:26pm

Hi @bcanax,
Sorry for the long delay in getting back to you.
Our testing on GTX1080 hasn’t revealed a similar issue. Can you try running the refinement that is failing, but set the refinement box size (parameter in the refinement section) to a smaller value, and see if the memory error disappears?
Can you also check via nvidia-smi that nothing else is using up RAM on the GPU?
Finally, you can force downgrade the cryosparc version back to 0.3.9 like so:

cryosparc stop
cryosparc update --version=v0.3.9

Let us know what happens.
Thanks,
Ali

bcanax · May 18, 2017, 1:07pm

When I reduced the box size the refinement worked. However, after running couple of those jobs I am getting the hardware not registered error. Not sure whether it is related issue or not. Restarting cryosparc doesn’t resolve the issue.

bcanax · May 18, 2017, 1:25pm

@apunjani, Deactivating and activating resolves the hardware is not registered error. Everything seems to be working fine.
However, I haven’t played with different box sizes to see what may be the size limit for my data set.
Thanks.

spunjani · October 11, 2018, 7:52pm