Error in 3D Flex Reconstruction

adtaheri · January 31, 2023, 7:06pm

Hi everyone,

I am running 3D flex refine on a relatively small set of data and it is proceeding fine through the training and mesh prep job steps, however, when I reach the reconstruction step I receive the following error:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 96, in cryosparc_compute.run.main
File “/media/raid/cryosparc/cryosparc_worker/cryosparc_compute/jobs/flex_refine/run_highres.py”, line 113, in run
ctfs_cpu = n.empty((N_D, N_highres, N_highres//2+1), n.float32)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 48.2 GiB for an array with shape (161000, 400, 201) and data type float32

I am unsure how the allocation error is appearing in this instance as I have run the 3D flex refine process on different data successfully. I am unsure as well why it is sampling an uneven array of the sizes above. Any ideas on how to approach this issue?

jenchem · February 1, 2023, 7:01pm

I had a similar error with the inability to allocate the memory for the array. I figured it was due to my box size so I downsampled the particles and reprocessed and it ran fine.

adtaheri · February 1, 2023, 7:32pm

That totally makes sense. I will give it a try. Would you recommend a strategy possibly like breaking up the particle groups and running FlexRefine on each group or do you think it would be better to downsample instead?

rbs_sci · February 2, 2023, 5:29am

Better to downsample I think, as breaking the data into groups will risk splitting the flexibility in unexpected ways.

mmclean · February 6, 2023, 3:31pm

Hi all,

The specific error encountered here is during allocation of an array to store all CTFs of the entire particle dataset in one large array, hence the array shape. (If you’re curious, the CTFs are a Fourier-space array, and the shape of the array is the same as used in common FFT packages for real-to-complex fourier transforms; see numpy as an example). 3D Flex (including the 3D Flex reconstruction) currently doesn’t implement caching, so it consumes an amount of host memory proportional to the number of particles in the dataset. There are essentially only two ways of working around this right now, as mentioned above – downsampling particles, or using less particles. If you aren’t hitting nyquist resolution (as measured by the FSC calculated at the end of the reconstruction job), it likely makes more sense to downsample (as opposed to using fewer particles). If you still need to use fewer particles, you can use the Particle Sets Tool to split up the dataset.

PS: a slightly tangential note – you may need to increase the “Max BFGS iterations” parameter in the reconstruction job, in order to ensure that the BFGS optimization has converged. This helps ensure the final resolution estimate is accurate.

Best,
Michael