Error in Homogeneous Refinement (v2.12)

MHB · December 2, 2019, 11:29am

The following error occurs after the first round when including tetra:

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1489, in run_with_except_hook
    run_old(*args, **kw)
  File "/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_worker/cryosparc2_compute/engine/newengine.py", line 1388, in cryosparc2_compute.engine.newengine.process.work
  File "cryosparc2_worker/cryosparc2_compute/engine/newengine.py", line 1406, in cryosparc2_compute.engine.newengine.process.work
  File "cryosparc2_worker/cryosparc2_compute/engine/newengine.py", line 428, in cryosparc2_compute.engine.newengine.EngineThread.preprocess_image_data
  File "cryosparc2_worker/cryosparc2_compute/engine/newgfourier.py", line 22, in cryosparc2_compute.engine.newgfourier.get_plan_R2C_2D
  File "/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py", line 124, in __init__
    cufft.cufftSetAutoAllocation(self.handle, auto_allocate)
  File "/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 758, in cufftSetAutoAllocation
    cufftCheckStatus(status)
  File "/home/cryosparc_user/V2.X/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py", line 117, in cufftCheckStatus
    raise e
cufftInvalidPlan

apunjani · December 2, 2019, 6:57pm

@MHB thanks for reporting - did this bug go away after you ran the job a second time (presumable with tetrafoil support off?)
The bug is related to the number of batches in each GPU-batch that gets processed concurrently. Can you try to run the job again by changing the parameter “Batchsize snrfactor” to a larger number, like 100?
We are trying to find a case where we can reproduce this so it can be fixed.

MHB · December 2, 2019, 8:10pm

Yes runs fine when tetra is off. I will adjust Batchsize snrfactor and reprot back

MHB · December 3, 2019, 5:34am

Adusting snrfactor to 100 runs without error. Any other test i can do to help?

apunjani · December 3, 2019, 9:14pm

Thanks!
Unfortunately we still haven’t been able to reproduce this error here. We did encounter it in an early build of v2.12 but had made several changes that had seemed to fix it since then.