3D classification running out of GPU memory in latest version

Hi,

3D classification jobs seem to be running out of GPU memory in the latest version, that were running fine in previous versions. 10 classes, target resolution 6, particle box size 300px, 400k particles total - nothing excessive. This is running on a 12GB GPU. Last lines of joblog attached.

Cheers
Oli

exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
exception in cufft.Plan.__del__:
exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
exception in cufft.Plan.__del__:
exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
exception in cufft.Plan.__del__:
exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
exception in cufft.Plan.__del__:
exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
exception in cufft.Plan.__del__:
exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
exception in cufft.Plan.__del__:
exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
exception in cufft.Plan.__del__:
exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
exception in force_free_cufft_plan:
exception in cufft.Plan.__del__:
exception in cufft.Plan.__del__:
exception in cufft.Plan.__del__:
exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
exception in cufft.Plan.__del__:
exception in cufft.Plan.__del__:
exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
exception in cufft.Plan.__del__:
exception in cufft.Plan.__del__:
exception in force_free_cufft_plan: 'NoneType' object has no attribute 'handle'
**** handle exception rc
Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 93, in cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/class3D/run.py", line 502, in cryosparc_compute.jobs.class3D.run.run_class_3D
  File "cryosparc_master/cryosparc_compute/jobs/class3D/run.py", line 1072, in cryosparc_compute.jobs.class3D.run.class3D_engine_run
  File "cryosparc_master/cryosparc_compute/jobs/class3D/run.py", line 1096, in cryosparc_compute.jobs.class3D.run.class3D_engine_run
  File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 601, in cryosparc_compute.engine.newengine.EngineThread.extract_data
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 357, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
  File "/home/user/software/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/gpuarray.py", line 210, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
set status to failed
========= main process now complete.
========= monitor process now complete.```

![image|690x140](upload://pdbrCDls1nKoRUh50ZwntwTROId.png)

Hi Oli,
Please can you email us the job report.
Thanks.

Hey @olibclarke,

Thanks for reporting this! Your job log pointed to class similarity tuning as the culprit. Indeed, this out-of-memory error is related to a bug where instead of a batch size of 500 (which we use elsewhere), we currently erroneously use a batch size of 10K during this tuning. In your case, 10K probably requires just over 12GB of GPU memory. A fix for this will be included in an upcoming dot release.

For now, you should be able to get around this by turning off Auto-tune initial class similarity (or downsampling particles a bit).

Valentin

1 Like

Got it, testing now! Will update to confirm that it has fixed the issue. Thanks!’

UPDATE: Confirmed, switching off class similarity tuning fixed the problem. Thanks @wtempel & @vperetroukhin!

2 Likes

@olibclarke this should be fixed in v4.1.2!

2 Likes