Hi - I’ve been trying to do a flex reconstruction on a data set, but it keeps failing with a relatively non-descript message of :
====== Job process terminated abnormally.
The box size is 512 so I tried resampling the data so that it was down to a 256 box size and had the same result. I also tried limiting the number of particles to 300k. I’m a bit stuck on where to troubleshoot from here and welcome any suggestions.
Sorry it took me so long to get this to you, but here are the outputs:
[cryo1@ai-rmlcryoprd1 ~]$ cryosparcm joblog P46 J646 | tail -n 30
========= sending heartbeat at 2024-09-04 19:17:19.297183
========= sending heartbeat at 2024-09-04 19:17:29.312589
========= sending heartbeat at 2024-09-04 19:17:39.328624
========= sending heartbeat at 2024-09-04 19:17:49.344805
========= sending heartbeat at 2024-09-04 19:17:59.360472
========= sending heartbeat at 2024-09-04 19:18:09.376554
========= sending heartbeat at 2024-09-04 19:18:19.392352
========= sending heartbeat at 2024-09-04 19:18:29.408222
========= sending heartbeat at 2024-09-04 19:18:39.423971
========= sending heartbeat at 2024-09-04 19:18:49.439714
========= sending heartbeat at 2024-09-04 19:18:59.455397
========= sending heartbeat at 2024-09-04 19:19:09.472774
========= sending heartbeat at 2024-09-04 19:19:19.490515
========= sending heartbeat at 2024-09-04 19:19:29.506251
========= sending heartbeat at 2024-09-04 19:19:39.521974
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 134217728 M = 10
This problem is unconstrained.
At X0 0 variables are exactly at the bounds
At iterate 0 f= 3.93447D+10 |proj g|= 2.56484D+04
========= sending heartbeat at 2024-09-04 19:19:49.537588
========= sending heartbeat at 2024-09-04 19:19:59.554370
========= main process now complete at 2024-09-04 19:20:02.221192.
========= monitor process now complete at 2024-09-04 19:20:02.276084.
[cryo1@ai-rmlcryoprd1 ~]$ cryosparcm eventlog P46 J646 | tail -n 30
[CPU RAM used: 180 MB] GPU : [0]
[CPU RAM used: 180 MB] RAM : [0, 1, 2, 3, 4, 5, 6, 7]
[CPU RAM used: 180 MB] SSD : False
[CPU RAM used: 180 MB] --------------------------------------------------------------
[CPU RAM used: 180 MB] Importing job module for job type flex_highres…
[CPU RAM used: 446 MB] Job ready to run
[CPU RAM used: 446 MB] ***************************************************************
[CPU RAM used: 519 MB] ====== 3D Flex Load Checkpoint =======
[CPU RAM used: 519 MB] Loading checkpoint from J645/J645_train_checkpoint_017600.tar …
[CPU RAM used: 956 MB] Initializing torch…
[CPU RAM used: 956 MB] Initializing model from checkpoint…
Input tetramesh
[CPU RAM used: 1081 MB] Upscaling deformation model to match input volume size…
Upsampled mask
Upsampled tetramesh
[CPU RAM used: 4111 MB] ====== Load particle data =======
[CPU RAM used: 4214 MB] Reading in all particle data on the fly from files…
[CPU RAM used: 4214 MB] Loading a ParticleStack with 300000 items…
[CPU RAM used: 4359 MB] Done.
[CPU RAM used: 4359 MB] Preparing all particle CTF data…
[CPU RAM used: 4360 MB] Parameter “Force re-do GS split” was off. Using input split…
[CPU RAM used: 4360 MB] Split A contains 150000 particles
[CPU RAM used: 4360 MB] Split B contains 150000 particles
[CPU RAM used: 4360 MB] Setting up particle poses…
[CPU RAM used: 4360 MB] ====== High resolution flexible refinement =======
[CPU RAM used: 4360 MB] Max num L-BFGS iterations was set to 20
[CPU RAM used: 4360 MB] Starting L-BFGS.
[CPU RAM used: 4360 MB] Reconstructing half-map A
[CPU RAM used: 4360 MB] Iteration 0 : 149000 / 150000 particles
[CPU RAM used: 190 MB] ====== Job process terminated abnormally.
[******@ai-rmlcpu22 ~]$ free -h
total used free shared buff/cache available
Mem: 1.0T 68G 935G 472M 3.1G 936G
Swap: 15G 30M 15G