3D Flex Reconstruc fails at iteration 0

Hi,

I’m trying 3D Flex on one of my dataset, a ribosome with extra floppy parts.
I managed to run all the steps until Flex Reconstruc, that fails at iteration 0 with the following message:

[2022-12-21 8:57:44.94]
[CPU: 5.81 GB]
====== Load particle data =======

[2022-12-21 8:57:44.94]
[CPU: 5.81 GB]
  Reading in all particle data...

[2022-12-21 8:57:44.94]
[CPU: 5.81 GB]
  Reading file 44 of 44 (J361/J361_particles_fullres_batch_00043.mrc)

[2022-12-21 9:00:16.29]
[CPU: 273.80 GB]
  Reading in all particle CTF data...

[2022-12-21 9:00:16.29]
[CPU: 273.80 GB]
  Reading file 44 of 44 (J361/J361_particles_fullres_batch_00043_ctf.mrc)

[2022-12-21 9:01:58.77]
[CPU: 401.74 GB]
  Parameter "Force re-do GS split" was off. Using input split..

[2022-12-21 9:01:58.78]
[CPU: 401.74 GB]
    Split A contains 109500 particles

[2022-12-21 9:01:58.78]
[CPU: 401.74 GB]
    Split B contains 109500 particles

[2022-12-21 9:01:58.78]
[CPU: 401.74 GB]
  Setting up particle poses..

[2022-12-21 9:01:58.80]
[CPU: 401.74 GB]
====== High resolution flexible refinement =======

[2022-12-21 9:01:58.81]
[CPU: 401.74 GB]
  Max num L-BFGS iterations was set to 20

[2022-12-21 9:01:58.81]
[CPU: 401.74 GB]
  Starting L-BFGS.

[2022-12-21 9:01:58.81]
[CPU: 401.74 GB]
  Reconstructing half-map A

[2022-12-21 9:01:58.81]
[CPU: 401.74 GB]
    Iteration 0 : 109000 / 109500 particles

[2022-12-21 9:17:20.42]
[CPU: 426.01 GB]
ValueError: 0-th dimension must be fixed to 95433884 but got 4390401180


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 93, in cryosparc_compute.run.main
  File "/scicore/home/engel0006/GROUP/pool-engel/soft/cryosparc/cryosparc_worker/cryosparc_compute/jobs/flex_refine/run_highres.py", line 150, in run
    flexmod.do_hr_refinement_flex(numiter=params['flex_bfgs_num_iters'])
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 1640, in cryosparc_compute.jobs.flex_refine.flexmod.do_hr_refinement_flex
  File "/scicore/home/engel0006/GROUP/pool-engel/soft/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_lbfgsb_py.py", line 199, in fmin_l_bfgs_b
    res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
  File "/scicore/home/engel0006/GROUP/pool-engel/soft/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_lbfgsb_py.py", line 353, in _minimize_lbfgsb
    _lbfgsb.setulb(m, x, low_bnd, upper_bnd, nbd, f, g, factr,
ValueError: failed in converting 10th argument `wa' of _lbfgsb.setulb to C/Fortran array

I don’t think this error was already posted elsewhere.
I’m on V4.1.1 (updated yesterday).
I see that it’s using a lot of CPU memory, could that be the source of the problem? Reading the error, it doesn’t look like it is though.

Thanks for the help.

EDIT: Actually, it looks like something similar was happening there https://discuss.cryosparc.com/t/3d-flex-reconstruction-stalls-at-large-box-sizes/10072, but the error message isn’t the same.

Edit2: We run NVIDIA A40s with 48 Go of memory, and 2To of RAM

To update on that matter, I tried again, restarting with a subset of 120k particles instead of the full 220k particles initially.

I still get the same error:

[2022-12-21 18:24:09.15]
[CPU: 5.51 GB]
====== Load particle data =======

[2022-12-21 18:24:09.15]
[CPU: 5.51 GB]
  Reading in all particle data...

[2022-12-21 18:24:09.16]
[CPU: 5.51 GB]
  Reading file 24 of 24 (J376/J376_particles_fullres_batch_00023.mrc)

[2022-12-21 18:25:34.51]
[CPU: 155.05 GB]
  Reading in all particle CTF data...

[2022-12-21 18:25:34.51]
[CPU: 155.05 GB]
  Reading file 24 of 24 (J376/J376_particles_fullres_batch_00023_ctf.mrc)

[2022-12-21 18:26:26.83]
[CPU: 224.14 GB]
  Parameter "Force re-do GS split" was off. Using input split..

[2022-12-21 18:26:26.83]
[CPU: 224.14 GB]
    Split A contains 60000 particles

[2022-12-21 18:26:26.84]
[CPU: 224.14 GB]
    Split B contains 60000 particles

[2022-12-21 18:26:26.84]
[CPU: 224.14 GB]
  Setting up particle poses..

[2022-12-21 18:26:26.86]
[CPU: 224.14 GB]
====== High resolution flexible refinement =======

[2022-12-21 18:26:26.86]
[CPU: 224.14 GB]
  Max num L-BFGS iterations was set to 20

[2022-12-21 18:26:26.86]
[CPU: 224.14 GB]
  Starting L-BFGS.

[2022-12-21 18:26:26.87]
[CPU: 224.14 GB]
  Reconstructing half-map A

[2022-12-21 18:26:26.87]
[CPU: 224.14 GB]
    Iteration 0 : 59000 / 60000 particles

[2022-12-21 18:34:43.41]
[CPU: 248.42 GB]
ValueError: 0-th dimension must be fixed to 95433884 but got 4390401180


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 93, in cryosparc_compute.run.main
  File "/scicore/home/engel0006/GROUP/pool-engel/soft/cryosparc/cryosparc_worker/cryosparc_compute/jobs/flex_refine/run_highres.py", line 150, in run
    flexmod.do_hr_refinement_flex(numiter=params['flex_bfgs_num_iters'])
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 1640, in cryosparc_compute.jobs.flex_refine.flexmod.do_hr_refinement_flex
  File "/scicore/home/engel0006/GROUP/pool-engel/soft/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_lbfgsb_py.py", line 199, in fmin_l_bfgs_b
    res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
  File "/scicore/home/engel0006/GROUP/pool-engel/soft/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/scipy/optimize/_lbfgsb_py.py", line 353, in _minimize_lbfgsb
    _lbfgsb.setulb(m, x, low_bnd, upper_bnd, nbd, f, g, factr,
ValueError: failed in converting 10th argument `wa' of _lbfgsb.setulb to C/Fortran array

Any idea?

Hi @Flow thanks for reporting.
This is not something we’ve seen before unfortunately and it’s quite cryptic. Could you report the following:

  • box size of the particles (i.e. the crop size and train size from 3D Flex Data Prep)
  • OS you’re running on
  • Version you’re running on (is it v4.1.0 or v4.1.1)
  • has the 3D Flex Reconstruct job worked for you on any other dataset so far?

Any other investigation or test you can do that narrows down the cause or specific situation where this error happens would be very helpful. You can e.g. run the data prep job again with a different crop size, and then use those particles in the reconstruct job with the model you already have trained.

Sure,

  • box size was as follow:

Cropping input particles to box size 560 for reconstruction
Downsampling cropped particles to box size 140 (pixel size 4.2320 A) for training
Nyquist resolution at training time will be 8.4640 A

  • Our cluster is runnning on CentOS
  • We run the lastest cryosparc version, 4.1.1
  • For now, I only tested 3D Flex Reconstruct on that specific dataset. I will give it a try on another one soon.

What is puzzling me is that all the previous steps worked, and even though I thought it could be a memory problem, it doesn’t seem to be.

Thanks for the reply

Hey, here’s an update on the problem.

I have now tested 3D Flex on a different dataset.
Training was perfomed with 277,000 particles and box size was as follow

[2022-12-26 23:53:12.79]
[CPU: 886.1 MB]
  Input particles have box size 512

[2022-12-26 23:53:12.80]
[CPU: 886.1 MB]
  Input particles have pixel size 1.0580 A

[2022-12-26 23:53:12.82]
[CPU: 886.1 MB]
  Cropping input particles to box size 448 for reconstruction

[2022-12-26 23:53:12.82]
[CPU: 886.1 MB]
  Downsampling cropped particles to box size 112 (pixel size 4.2320 A) for training 

[2022-12-26 23:53:12.82]
[CPU: 886.1 MB]
  Nyquist resolution at training time will be 8.4640 A

Flex Mesh and Flex Train worked nicely, as well as Flex Generate.

I then ran Flex Reconstruct, and it again failed at It0, this time without the red error message, only this:

[2022-12-27 19:18:56.18]
[CPU: 323.13 GB]
====== High resolution flexible refinement =======

[2022-12-27 19:18:56.18]
[CPU: 323.13 GB]
  Max num L-BFGS iterations was set to 20

[2022-12-27 19:18:56.18]
[CPU: 323.13 GB]
  Starting L-BFGS.

[2022-12-27 19:18:56.18]
[CPU: 323.13 GB]
  Reconstructing half-map A

[2022-12-27 19:18:56.19]
[CPU: 323.13 GB]
    Iteration 0 : 138000 / 138500 particles

[2022-12-27 19:34:40.17]
[CPU: 172.0 MB]
====== Job process terminated abnormally.

Again, to make sure, I ran again Flex Data Prep, same parameters, but with 80,000 particles instead of the full 277,000. I ran Flex Reconstruct with the model trained on the 277,000 particles and again got the same problem:

[2022-12-28 0:51:39.23]
[CPU: 97.55 GB]
====== High resolution flexible refinement =======

[2022-12-28 0:51:39.24]
[CPU: 97.55 GB]
  Max num L-BFGS iterations was set to 20

[2022-12-28 0:51:39.24]
[CPU: 97.55 GB]
  Starting L-BFGS.

[2022-12-28 0:51:39.24]
[CPU: 97.55 GB]
  Reconstructing half-map A

[2022-12-28 0:51:39.25]
[CPU: 97.55 GB]
    Iteration 0 : 39000 / 40000 particles

[2022-12-28 0:57:47.78]
[CPU: 174.1 MB]
====== Job process terminated abnormally.

What I noticed:
When loading the particles by batch of 1000 at this step:

Reconstructing half-map A
Iteration 0:

It first loads everything once properly (the 138500 particles or 40000 particles), then it restarts again the same process (is it normal? shouldn’t it do half-map B next instead of restarting?) and when it reaches the last 1000 batch of the 2nd loading, it fails.

Let me know if you need anything else from our side to understand what’s going on.

Thanks in advance

Any idea? We are still clueless about the error.

We have the same Job process terminated abnormally.. error.
Nothing about it in logs.
All other 3Dflex jobs finish successfully.

Setup:
CS 4.1.1-230110
Tesla V100 SXM2 32G
Rocky Linux 8.6
Tested with up to 320G of memory.
Extensive Workflow dataset.

1 Like

We have this problem too! ‘process terminated abnormally’ no clues about the cause.
v. 4.1.2
P100 GPUs
CentOS 7
256 GB RAM

@DaveBhella Please can you describe your dataset/processing parameters

The particles are in a box of 450 pixels - I could possibly reduce this a little - maybe to 400. The aim is to achieve a map of sufficient resolution to model the flexible domain - which I have spectacularly failed to achieve using other approaches, so I am not keen to bin the data - could go down a little I suppose. I will try working with smaller box size particles.

Thanks it worked well with the smaller box size data

1 Like

I also confirm that smaller box size helps.