Homogeneous refinement out of memory

Hi, we had homogeneous refinement failing for 3.1 with:

[CPU: 15.20 GB]  Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 84, in cryosparc_compute.run.main
  File "cryosparc_worker/cryosparc_compute/jobs/refine/newrun.py", line 447, in cryosparc_compute.jobs.refine.newrun.run_homo_refine
  File "cryosparc_worker/cryosparc_compute/jobs/refine/newrun.py", line 448, in cryosparc_compute.jobs.refine.newrun.run_homo_refine
  File "cryosparc_worker/cryosparc_compute/jobs/ctf_refinement/run_local.py", line 203, in cryosparc_compute.jobs.ctf_refinement.run_local.full_defocus_refine
  File "cryosparc_worker/cryosparc_compute/engine/newengine.py", line 312, in cryosparc_compute.engine.newengine.EngineThread.load_models_rspace
  File "cryosparc_worker/cryosparc_compute/engine/newgfourier.py", line 152, in cryosparc_compute.engine.newgfourier.rfft3_on_gpu_inplace
  File "cryosparc_worker/cryosparc_compute/engine/newgfourier.py", line 71, in cryosparc_compute.engine.newgfourier.get_plan_R2C_3D
  File "/home/tomo/Software/cryosparc/cryosparc_worker/cryosparc_compute/skcuda_internal/fft.py", line 134, in __init__
    onembed, ostride, odist, self.fft_type, self.batch)
  File "/home/tomo/Software/cryosparc/cryosparc_worker/cryosparc_compute/skcuda_internal/cufft.py", line 749, in cufftMakePlanMany
    cufftCheckStatus(status)
  File "/home/tomo/Software/cryosparc/cryosparc_worker/cryosparc_compute/skcuda_internal/cufft.py", line 124, in cufftCheckStatus
    raise e
cryosparc_compute.skcuda_internal.cufft.cufftAllocFailed

So we used non-homogeneous refinement that worked very well.

We’ve recently updated CS to 3.3 (patch included) and tried again the homogeneous refinement, but it failed, this time with:

[CPU: 11.09 GB]  Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 85, in cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/refine/newrun.py", line 467, in cryosparc_compute.jobs.refine.newrun.run_homo_refine
  File "cryosparc_master/cryosparc_compute/jobs/refine/newrun.py", line 468, in cryosparc_compute.jobs.refine.newrun.run_homo_refine
  File "cryosparc_worker/cryosparc_compute/jobs/ctf_refinement/run_local.py", line 215, in cryosparc_compute.jobs.ctf_refinement.run_local.full_defocus_refine
  File "cryosparc_worker/cryosparc_compute/engine/newengine.py", line 313, in cryosparc_compute.engine.newengine.EngineThread.load_models_rspace
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 353, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
  File "/home/tomo/Software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 210, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory

We have done a restart as mentioned in another topic but it does not fixed it.

GPU is used in previous iterations without problems.

Any ideas on how to solve this?

@pconesa1 Please can you provide:

  • output of uname -a
  • output of nvidia-smi
  • box size
  • cuda version used for that worker

Sure, happy new year by the way!

uname -a

Linux alter 5.4.0-91-generic #102~18.04.1-Ubuntu SMP Thu Nov 11 14:46:36 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

nvidia-smi

Mon Jan 10 14:40:31 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:3B:00.0 Off |                  N/A |
| 35%   51C    P8    12W / 250W |      5MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:5E:00.0 Off |                  N/A |
| 29%   46C    P8     5W / 250W |      5MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  Off  | 00000000:86:00.0 Off |                  N/A |
| 24%   38C    P8    10W / 250W |      5MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  Off  | 00000000:AF:00.0 Off |                  N/A |
| 34%   50C    P8     8W / 250W |     20MiB / 11016MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3930      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      3930      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      3930      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      3930      G   /usr/lib/xorg/Xorg                 18MiB |
+-----------------------------------------------------------------------------+

Box size
600x600

CUDA:
ls -l /usr/local | grep cuda
lrwxrwxrwx 1 root root 21 ene 15 2021 cuda → /usr/local/cuda-10.2/
drwxr-xr-x 18 root root 4096 nov 20 2020 cuda-10.1
drwxr-xr-x 15 root root 4096 ene 15 2021 cuda-10.2

Not sure which one is taking CS. Let me know how to be sure if needed.

All the best, Pablo.

@pconesa1 Happy New Year to you also.
In case you had “Optimize per-particle defocus” enabled during the refinement that crashed, would disabling that option allow the refinement job to complete?

Disabling the per-particle defocus works fine.

@pconesa1 You may next want to try enabling per-particle defocus optimization and specifying a “small” number in the 50…100 range for three GPU batch size of images: Advanced mode settings under Homogenous Refinement, Defocus Refinement, Global CTF Refinement, respectively. Does the refinement job complete with those settings?

That went well. Sorry for having annoying you with this. I wasn’t aware of those parameters and since heterogeneous refinement worked fine I thought there was something there to report.

No worries @pconesa1. This discussion may help users who encounter similar errors in the future.