ArrayMemoryError: System Memory Issues on a 128GB system

jr10 · March 31, 2021, 5:38pm

Hi,

Our lab recently ordered a cryo workstation from a vendor, and for the most part it has worked well for test data and some new data. However, the current issue is when running New 3D homogeneous refinement on a VLP with an 800px box size, we get the error copied below. The system has 128 GB of memory and is only processing ~24k particles for this job. Additionally, it always fails on the same round of refinement (round 2) while only processing ~5k particles. Is this an error with settings and the system being unable to allocate further memory to the process, or is the system really running out of memory this fast processing an 800px box?

Thank you for any help you can provide. Please let me know if further details of the system, or any logs, would be helpful and I can add them.

Best,
Justas

[CPU: 61.39 GB]  Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 84, in cryosparc_compute.run.main
  File "cryosparc_worker/cryosparc_compute/jobs/refine/newrun.py", line 608, in cryosparc_compute.jobs.refine.newrun.run_homo_refine
  File "/home/cryosparc_user/software/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py", line 1088, in compute_all_fscs
    radwns, fsc_loosemask = get_fsc(rMA, rMB, radwn_max, mask, mask)
  File "/home/cryosparc_user/software/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py", line 1038, in get_fsc
    fMB = fourier.fft(rMB*maskB)
  File "/home/cryosparc_user/software/cryosparc/cryosparc_worker/cryosparc_compute/fourier.py", line 110, in fft
    return fftcenter3(x, fft_threads)
  File "/home/cryosparc_user/software/cryosparc/cryosparc_worker/cryosparc_compute/fourier.py", line 74, in fftcenter3
    fv = fftmod.fftn(tmp, threads=th)
  File "/home/cryosparc_user/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pyfftw/interfaces/numpy_fft.py", line 169, in fftn
    calling_func, **_norm_args(norm))
  File "/home/cryosparc_user/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pyfftw/interfaces/_utils.py", line 128, in _Xfftn
    FFTW_object = getattr(builders, calling_func)(*planner_args)
  File "/home/cryosparc_user/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pyfftw/builders/builders.py", line 382, in fftn
    avoid_copy, inverse, real, **_norm_args(norm))
  File "/home/cryosparc_user/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pyfftw/builders/_utils.py", line 197, in _Xfftn
    output_array = pyfftw.empty_aligned(output_shape, output_dtype)
  File "pyfftw/utils.pxi", line 173, in pyfftw.pyfftw.empty_aligned
  File "pyfftw/utils.pxi", line 204, in pyfftw.pyfftw.empty_aligned
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 7.63 GiB for an array with shape (8192000032,) and data type int8

mmclean · March 31, 2021, 11:06pm

Hi @jr10,

This is likely a genuine out of memory error. Our current implementation of FSC computation is quite CPU memory intensive, requiring many allocations of the full volume, and with a large box size of 800 and 128GB of RAM it’s unfortunately not uncommon to run into memory limitations like this. Speeding up and reducing the memory requirements of FSC computation is on our list as a priority to include in the next release.

As a workaround, you can do the following and it (may) help you get a final structure at the 800 box size:

Run a “downsample particles” job, and downsample the data to a smaller box size (perhaps something between 500-600 – ideally the largest possible that will enable the refinement to run)
Use the output particles from the downsample job to run another homogeneous refinement job
When the refinement completes, take the output particles and the mask from the job, and connect them to a “Homogeneous reconstruction only” job. To reconstruct at the full box size, override the low-level blob output in input particles with the full-size data. You can do this by dragging the blob result from the original particle stack and dropping it onto the blob input in the reconstruction (it should turn green); you can find more information about this specific use case at our guide page. This job will just reconstruct the density from the alignments, but it uses significantly less memory than a refinement, hopefully enabling the reconstruction at the full box size! The only caveat is that higher resolution info can’t be used for alignment, but you’ll likely see a resolution improvement if your refinement hit the nyquist limit at the small box size.

Let me know if you have any questions!

Best,
Michael