3dflex reconstruction GPU memory

sheff_diamond_em · March 21, 2023, 12:33pm

Hi all,

I’ve been trying to run 3dflex reconstruct on a particle stack (after decent training and generation), and I keep running into the following error:

> **** handle exception rc
> Traceback (most recent call last):
>   File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main
>   File "/opt/cryosparc2/cryosparc2_worker/cryosparc_compute/jobs/flex_refine/run_highres.py", line 150, in run
>     flexmod.do_hr_refinement_flex(numiter=params['flex_bfgs_num_iters'])
>   File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 1640, in cryosparc_compute.jobs.flex_refine.flexmod.do_hr_refinement_flex.lambda7
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_lbfgsb_py.py", line 199, in fmin_l_bfgs_b
>     res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_lbfgsb_py.py", line 308, in _minimize_lbfgsb
>     sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 263, in _prepare_scalar_function
>     sf = ScalarFunction(fun, x0, args, grad, hess,
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 158, in __init__
>     self._update_fun()
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 251, in _update_fun
>     self._update_fun_impl()
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 155, in update_fun
>     self.f = fun_wrapped(self.x)
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 137, in fun_wrapped
>     fx = fun(np.copy(x), *args)
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 76, in __call__
>     self._compute_if_needed(x, *args)
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 70, in _compute_if_needed
>     fg = self.fun(x, *args)
>   File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 1640, in cryosparc_compute.jobs.flex_refine.flexmod.do_hr_refinement_flex.lambda7
>   File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 1621, in cryosparc_compute.jobs.flex_refine.flexmod.errfunc_flex
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
>     torch.autograd.backward(
>   File "/opt/cryosparc2/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
>     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
> torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.17 GiB (GPU 0; 10.91 GiB total capacity; 8.56 GiB already allocated; 649.38 MiB free; 9.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
> set status to failed

Any ideas on how to address this? Our GPUs have 12 gb, so not sure why it’s running out. Also I’m not sure where to try the ‘max_split_size_mb’ srgument, does that go straight into the command line?

Any help much appreciated,

Fred

mmclean · March 29, 2023, 4:26pm

Hi @sheff_diamond_em,

Thanks for the question. Right now the only way in which we can reduce memory usage of Flex Reconstruction is to reduce the box size of the full-resolution (fullres) particles, and/or to reduce the number of particles used. If you’re structure is not resolved to the Nyquist resolution at the given box size, then reducing the box size is likely the preferred option. Since you already have a trained model, you can reduce the box size of the full-resolution (fullres) particles and re-run Flex Data Prep, while keeping the same trained model.

To do this:

Run “Downsample Particles” on the original particles (the ones input to the initial Flex Data Prep job) with:
- the “Crop to box size (pix)” parameter set to the same value as that in the initial Flex Data Prep job. If this was empty, than the particles have not been cropped and the input box size is used.
- the “Fourier crop to box size (pix)” parameter set to a value smaller than the current box size. To reduce the memory usage as much as possible, I would set this value to a number that ensures the resulting pixel size is half the resolution value that your model refines to. The resulting pixel size is given by the pixel_size_input_particles * crop_box_size / fourier_crop_box_size
Take these downsampled particles and re-run Flex Data Prep with the crop parameter empty and the Training Box Size the same value as that used in the previous Flex Data Prep job.
Take these prepped particles and connect them to Flex Reconstruct, along with the already trained flex model.

Let me know if you have any questions. I hope this reduces the memory sufficiently,
Michael

sheff_diamond_em · April 12, 2023, 9:55am

Hi Michael, thanks for the response, that seems to be working now!
Best wishes,

Fred

Flow · April 19, 2023, 11:28am

Hi guys, thank you for the tip, it ended up fixing this problem (3D Flex Reconstruc fails at iteration 0 - #5 by Flow) that I had and thought was attributed to an incompatibility due to an old version of CentOS.

I managed to run this procedure on a couple of dataset without any problem and run all the 3D Flex jobs with downsampled particles.
However, I have a particular dataset where I encounter this error message now when trying to run the downsample particle job:

[2023-04-19 9:28:06.43]
[CPU:  260.7 MB  Avail:1947.53 GB]
Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main
  File "/scicore/home/engel0006/GROUP/pool-engel/soft/cryosparc/cryosparc_worker/cryosparc_compute/jobs/extract/run.py", line 893, in run_downsample_particles
    assert n.allclose(psize_align_2D, particles_dset['alignments2D/psize_A'], atol=1e-4), "Particles must all have the same alignment pixel size. If multiple particle stacks were input into the job, please instead run separate downsample jobs for each particle stack."
AssertionError: Particles must all have the same alignment pixel size. If multiple particle stacks were input into the job, please instead run separate downsample jobs for each particle stack.

The particles should all have the same alignment pixel size so I don’t really understand.
Originally particles come from two sets of import movies, but from the same microscope with all the same parameters and even from the same data acquisition. The movies were just sorted post acquisition based on astigmatism.
On the other dataset where 3DFlex worked, the particles even come from different data collection, so i’m a bit clueless.
To fix this, I tried to run a Restack job on the particle output from 3D Flex Data Prep and then tried to run Downsample again, but it did not work either.

Any idea on how to fix that?

niejiawei · June 4, 2023, 1:27pm

I believe that you can align all the particles to achieve the same alignment pixel size by using 2D or 3D alignment (may need to re-extract all particles), then perform 3D flexible reconstruction.

mmclean · June 26, 2023, 10:09pm

Hi @Flow,

Apologies for the delayed response. Let us know if trying @niejiawei 's suggestion helped alleviate the issue.

Alternatively, you may be able to workaround this by manually disconnecting the alignments2D input from the particle stack input to the Downsample Particles job, when building it; this can be done by clicking on the “x” under the particle inputs tab drop-down.

Best,
Michael