NU refinement failing in higher symmetries

marinegor · October 28, 2021, 11:31am

Hi everyone,

I’m trying to run a NU refinement job on 960 px particles with internal icosahedral symmetry, and it fails with out-of-memory error, and also in some of the high-order subgroups of the I group (I tried T and D20). The full traceback is:

[CPU: 29.44 GB]  Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 84, in cryosparc_compute.run.main
  File "cryosparc_worker/cryosparc_compute/jobs/refine/newrun.py", line 330, in cryosparc_compute.jobs.refine.newrun.run_homo_refine
  File "/opt/cryosparc/cryosparc_worker/cryosparc_compute/alignment.py", line 204, in align_symmetry
    gpuarray.to_gpu(n.copy(fVnopre.imag).astype(n.float32)))
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 1049, in to_gpu
    result = GPUArray(ary.shape, ary.dtype, allocator, strides=_compact_strides(ary))
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 210, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory

I tried reducing the batch size to as little as 50 particles, but it seems to have no effect – the job fails at:

[CPU: 12.72 GB]  ====== Initial Model ======
[CPU: 12.72 GB]    Resampling initial model to specified volume representation size and pixel-size...
[CPU: 15.56 GB]    Aligning initial model to symmetry.

which is the very beginning.
Also, the same job runs well in C1 symmetry, which is a bit unintuitive: I’d suggest that symmetry should reduce the memory consumption roughly “point-group-order” times. However, it seems to be not the case.

As for hardware, we have 4x2080 Ti GPUs and 256 Gb RAM + 256 Gb swap.

Any advices on what should we do, except for replacing the GPUs?

mmclean · November 2, 2021, 6:09pm

Hi @marinegor

Apologies that you’re encountering this error, and thanks for including the traceback and logs. The symmetry alignment subroutine (where the fail is happening – during the Aligning initial model to symmetry) will be made more precise & GPU memory efficient in our upcoming release.

In the meantime, can you run the job with the Do symmetry alignment parameter off?

If the volume isn’t already aligned to the symmetry axes, you could workaround this by:

downsampling the particles to half the current box size,
running a regular refinement with symmetry alignment on using the downsampled particles,
connecting the full size particles + the output volume to a new non-uniform refinement with symmetry alignment off

Best,
Michael

marinegor · November 3, 2021, 12:18pm

Hi Michael, thanks a lot for your reply!

We managed to do the processing via reducing the particle size a little bit (to 850-ish), but I’ll be glad to try the updated version as soon as it comes live!