Flex Reconstruction Failing

Hi - I’ve been trying to do a flex reconstruction on a data set, but it keeps failing with a relatively non-descript message of :

====== Job process terminated abnormally.

The box size is 512 so I tried resampling the data so that it was down to a 256 box size and had the same result. I also tried limiting the number of particles to 300k. I’m a bit stuck on where to troubleshoot from here and welcome any suggestions.

Please can you post the outputs of these commands

  1. on the CryoSPARC master host
    cryosparcm joblog P99 J199 | tail -n 30
    cryosparcm eventlog P99 J199 | tail -n 30
    
    where you substitute the failing job’s actual project and job IDs
  2. on the CryoSPARC worker where the job ran
    free -h
    sudo journalctl | grep -i oom
    

Sorry it took me so long to get this to you, but here are the outputs:

[cryo1@ai-rmlcryoprd1 ~]$ cryosparcm joblog P46 J646 | tail -n 30
========= sending heartbeat at 2024-09-04 19:17:19.297183
========= sending heartbeat at 2024-09-04 19:17:29.312589
========= sending heartbeat at 2024-09-04 19:17:39.328624
========= sending heartbeat at 2024-09-04 19:17:49.344805
========= sending heartbeat at 2024-09-04 19:17:59.360472
========= sending heartbeat at 2024-09-04 19:18:09.376554
========= sending heartbeat at 2024-09-04 19:18:19.392352
========= sending heartbeat at 2024-09-04 19:18:29.408222
========= sending heartbeat at 2024-09-04 19:18:39.423971
========= sending heartbeat at 2024-09-04 19:18:49.439714
========= sending heartbeat at 2024-09-04 19:18:59.455397
========= sending heartbeat at 2024-09-04 19:19:09.472774
========= sending heartbeat at 2024-09-04 19:19:19.490515
========= sending heartbeat at 2024-09-04 19:19:29.506251
========= sending heartbeat at 2024-09-04 19:19:39.521974
RUNNING THE L-BFGS-B CODE

       * * *

Machine precision = 2.220D-16
N = 134217728 M = 10
This problem is unconstrained.

At X0 0 variables are exactly at the bounds

At iterate 0 f= 3.93447D+10 |proj g|= 2.56484D+04
========= sending heartbeat at 2024-09-04 19:19:49.537588
========= sending heartbeat at 2024-09-04 19:19:59.554370
========= main process now complete at 2024-09-04 19:20:02.221192.
========= monitor process now complete at 2024-09-04 19:20:02.276084.


[cryo1@ai-rmlcryoprd1 ~]$ cryosparcm eventlog P46 J646 | tail -n 30
[CPU RAM used: 180 MB] GPU : [0]
[CPU RAM used: 180 MB] RAM : [0, 1, 2, 3, 4, 5, 6, 7]
[CPU RAM used: 180 MB] SSD : False
[CPU RAM used: 180 MB] --------------------------------------------------------------
[CPU RAM used: 180 MB] Importing job module for job type flex_highres…
[CPU RAM used: 446 MB] Job ready to run
[CPU RAM used: 446 MB] ***************************************************************
[CPU RAM used: 519 MB] ====== 3D Flex Load Checkpoint =======
[CPU RAM used: 519 MB] Loading checkpoint from J645/J645_train_checkpoint_017600.tar …
[CPU RAM used: 956 MB] Initializing torch…
[CPU RAM used: 956 MB] Initializing model from checkpoint…
Input tetramesh
[CPU RAM used: 1081 MB] Upscaling deformation model to match input volume size…
Upsampled mask
Upsampled tetramesh
[CPU RAM used: 4111 MB] ====== Load particle data =======
[CPU RAM used: 4214 MB] Reading in all particle data on the fly from files…
[CPU RAM used: 4214 MB] Loading a ParticleStack with 300000 items…
[CPU RAM used: 4359 MB] Done.
[CPU RAM used: 4359 MB] Preparing all particle CTF data…
[CPU RAM used: 4360 MB] Parameter “Force re-do GS split” was off. Using input split…
[CPU RAM used: 4360 MB] Split A contains 150000 particles
[CPU RAM used: 4360 MB] Split B contains 150000 particles
[CPU RAM used: 4360 MB] Setting up particle poses…
[CPU RAM used: 4360 MB] ====== High resolution flexible refinement =======
[CPU RAM used: 4360 MB] Max num L-BFGS iterations was set to 20
[CPU RAM used: 4360 MB] Starting L-BFGS.
[CPU RAM used: 4360 MB] Reconstructing half-map A
[CPU RAM used: 4360 MB] Iteration 0 : 149000 / 150000 particles
[CPU RAM used: 190 MB] ====== Job process terminated abnormally.


[******@ai-rmlcpu22 ~]$ free -h
total used free shared buff/cache available
Mem: 1.0T 68G 935G 472M 3.1G 936G
Swap: 15G 30M 15G


[*****@ai-rmlcpu22 ~]$ sudo journalctl | grep -i oom
[sudo] password for ****:
[*****@ai-rmlcpu22 ~]$