Hi, 3dflex reconstruction job fails consistently on several nodes with different OS, number of CPUs, GPUs, and RAM. From the job log file (attached) it looks like the problem is lbgfsb library. I am running the job on CS version 4.7.1. Error in job window just says ‘Job process terminated abnormally.’ The job crashes right away starting iteration 0. Here is the bottom text in job window before it crashes:
“Starting L-BFGS.
[2025-07-12 2:40:24.56]
[CPU: 3.00 GB Avail: 58.34 GB]
Reconstructing half-map A
[2025-07-12 2:40:24.57]
[CPU: 3.00 GB Avail: 58.34 GB]
Iteration 0 : 11000 / 11486 particles”
This type of jobs did run before successfully, not sure where the difference is.
Thank you, Michael
================= CRYOSPARCW ======= 2025-07-12 02:36:57.453937 =========
Project PYYY Job J299
Master cryosparc.host.XXXX Port 39002
===========================================================================
MAIN PROCESS PID 2390072
========= now starting main process at 2025-07-12 02:36:57.454404
flex_refine.run_highres cryosparc_compute.jobs.jobregister
MONITOR PROCESS PID 2390074
========= monitor process now waiting for main process
========= sending heartbeat at 2025-07-12 02:36:59.822120
========= sending heartbeat at 2025-07-12 02:37:09.837006
<string>:1: DeprecationWarning: Please import `map_coordinates` from the `scipy.ndimage` namespace; the `scipy.ndimage.interpolation` namespace is deprecated and will be removed in SciPy 2.0.0.
========= sending heartbeat at 2025-07-12 02:37:19.851340
========= sending heartbeat at 2025-07-12 02:37:29.866546
***************************************************************
Transparent hugepages setting: [always] madvise never
Running job J299 of type flex_highres
Running job on hostname %s vds1-2.ZZZ.edu
Allocated Resources : {'fixed': {'SSD': False}, 'hostname': 'vds1-2.ZZZ.edu', 'lane': 'vds12', 'lane_type': 'node', 'license': True, 'licenses_acquired': 1, 'slots': {'CPU': [0, 1, 2, 3], 'GPU': [0], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7]}, 'target': {'cache_path': '/mnt/scratch/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 25651445760, 'name': 'Quadro M6000 24GB'}], 'hostname': 'vds1-2.ZZZ.edu', 'lane': 'vds12', 'monitor_port': None, 'name': 'vds1-2.ZZZ.edu', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'GPU': [0], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7]}, 'ssh_str': 'cryosparc@vds1-2.ZZZ.edu', 'title': 'Worker node vds1-2.ZZZ.edu', 'type': 'node', 'worker_bin_path': '/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/bin/cryosparcw'}}
2025-07-12 02:37:30,741 run_with_executor INFO | Resolving 5 source path(s) for caching
2025-07-12 02:37:30,744 run_with_executor INFO | Resolved 5 sources in 0.00 seconds
2025-07-12 02:37:30,774 allocate INFO | Cache allocation start. Active run IDs: P324-J3128-1752288435, P356-J99-1752188587, P324-J3142-1752200292, P291-J299-1752315930, P369-J16-1752283829, P369-J17-1752283831, P369-J18-1752283965, P367-J112-1752313630
2025-07-12 02:37:30,921 refresh INFO | Refreshed cache drive in 0.15 seconds
2025-07-12 02:37:30,924 allocate INFO | Deleted 0 cached files, encountered 0 errors
2025-07-12 02:37:30,925 allocate INFO | Allocated 5 stub cache files; creating links
2025-07-12 02:37:30,925 allocate INFO | Cache allocation complete
2025-07-12 02:37:30,925 run_with_executor INFO | Cache allocation ran in 0.16 seconds
2025-07-12 02:37:30,925 run_with_executor INFO | Found 0 SSD hit(s)
2025-07-12 02:37:30,925 run_with_executor INFO | Transferring 5 file(s)...
========= sending heartbeat at 2025-07-12 02:37:39.881532
========= sending heartbeat at 2025-07-12 02:37:49.897564
========= sending heartbeat at 2025-07-12 02:37:59.912318
========= sending heartbeat at 2025-07-12 02:38:09.922795
========= sending heartbeat at 2025-07-12 02:38:19.937535
========= sending heartbeat at 2025-07-12 02:38:29.952531
========= sending heartbeat at 2025-07-12 02:38:39.967721
========= sending heartbeat at 2025-07-12 02:38:49.983529
========= sending heartbeat at 2025-07-12 02:38:59.999001
========= sending heartbeat at 2025-07-12 02:39:10.014530
========= sending heartbeat at 2025-07-12 02:39:20.029725
========= sending heartbeat at 2025-07-12 02:39:30.044572
========= sending heartbeat at 2025-07-12 02:39:40.059560
========= sending heartbeat at 2025-07-12 02:39:50.074556
========= sending heartbeat at 2025-07-12 02:40:00.089605
========= sending heartbeat at 2025-07-12 02:40:10.104571
========= sending heartbeat at 2025-07-12 02:40:20.118961
2025-07-12 02:40:24,024 run_with_executor INFO | Transferred /mnt/gimli/data1/CS-ryadel-thawed/J293/J293_particles_fullres_batch_00001.mrc to SSD key 5cd2792b66f7ab256ae52c477447385e66bb67ad
2025-07-12 02:40:24,033 run_with_executor INFO | Transferred /mnt/gimli/data1/CS-ryadel-thawed/J293/J293_particles_fullres_batch_00002.mrc to SSD key 024f0e3472f304fc4fe61419fa6b389472db4c4d
2025-07-12 02:40:24,034 run_with_executor INFO | Transferred /mnt/gimli/data1/CS-ryadel-thawed/J293/J293_particles_fullres_batch_00000.mrc to SSD key f421d75b434df829ea1f3924bedbdc2b6e724434
2025-07-12 02:40:24,036 run_with_executor INFO | Transferred /mnt/gimli/data1/CS-ryadel-thawed/J293/J293_particles_fullres_batch_00003.mrc to SSD key 9ff70eea41fce6f7703e500410233fa50206c9e6
2025-07-12 02:40:24,078 run_with_executor INFO | Transferred /mnt/gimli/data1/CS-ryadel-thawed/J293/J293_particles_fullres_batch_00004.mrc to SSD key 8dc9e1eb6ad0da477f4b34332622f99e7b49f429
2025-07-12 02:40:24,080 run_with_executor INFO | Unlocked 5 file(s)
2025-07-12 02:40:24,080 run_with_executor INFO | Requested files successfully cached to SSD
2025-07-12 02:40:24,086 run_with_executor INFO | SSD cache complete
<string>:1: DeprecationWarning: Please import `fmin_l_bfgs_b` from the `scipy.optimize` namespace; the `scipy.optimize.lbfgsb` namespace is deprecated and will be removed in SciPy 2.0.0.
========= sending heartbeat at 2025-07-12 02:40:30.126824
========= sending heartbeat at 2025-07-12 02:40:40.141551
WARNING: io_uring support disabled (not supported by kernel), I/O performance may degrade
========= sending heartbeat at 2025-07-12 02:40:50.157184
========= sending heartbeat at 2025-07-12 02:41:00.166910
========= sending heartbeat at 2025-07-12 02:41:10.182090
========= sending heartbeat at 2025-07-12 02:41:20.197354
========= sending heartbeat at 2025-07-12 02:41:30.212532
========= sending heartbeat at 2025-07-12 02:41:40.248945
========= sending heartbeat at 2025-07-12 02:41:50.264636
========= sending heartbeat at 2025-07-12 02:42:00.279625
========= sending heartbeat at 2025-07-12 02:42:10.294383
========= sending heartbeat at 2025-07-12 02:42:20.309303
========= sending heartbeat at 2025-07-12 02:42:30.322979
========= sending heartbeat at 2025-07-12 02:42:40.337645
========= sending heartbeat at 2025-07-12 02:42:50.343555
========= sending heartbeat at 2025-07-12 02:43:00.358514
========= sending heartbeat at 2025-07-12 02:43:10.373574
========= sending heartbeat at 2025-07-12 02:43:20.388796
========= sending heartbeat at 2025-07-12 02:43:30.400423
========= sending heartbeat at 2025-07-12 02:43:40.415534
========= sending heartbeat at 2025-07-12 02:43:50.457963
========= sending heartbeat at 2025-07-12 02:44:00.473288
========= sending heartbeat at 2025-07-12 02:44:10.488252
========= sending heartbeat at 2025-07-12 02:44:20.503232
========= sending heartbeat at 2025-07-12 02:44:30.517971
========= sending heartbeat at 2025-07-12 02:44:40.532586
========= sending heartbeat at 2025-07-12 02:44:50.547482
========= sending heartbeat at 2025-07-12 02:45:00.562315
========= sending heartbeat at 2025-07-12 02:45:10.576976
========= sending heartbeat at 2025-07-12 02:45:20.593652
========= sending heartbeat at 2025-07-12 02:45:30.608996
========= sending heartbeat at 2025-07-12 02:45:40.623177
========= sending heartbeat at 2025-07-12 02:45:50.638134
========= sending heartbeat at 2025-07-12 02:46:00.652536
========= sending heartbeat at 2025-07-12 02:46:10.664932
========= sending heartbeat at 2025-07-12 02:46:20.678182
========= sending heartbeat at 2025-07-12 02:46:30.693289
========= sending heartbeat at 2025-07-12 02:46:40.708165
========= sending heartbeat at 2025-07-12 02:46:50.719097
========= sending heartbeat at 2025-07-12 02:47:00.734049
========= sending heartbeat at 2025-07-12 02:47:10.749144
Received SIGSEGV (addr=00007f322db180b0)
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/ioengine/core.so(traceback_signal_handler+0x113)[0x7f3da1838a03]
/lib64/libpthread.so.0(+0x12990)[0x7f3db8037990]
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/scipy/optimize/_lbfgsb.cpython-310-x86_64-linux-gnu.so(+0x975b)[0x7f3da144e75b]
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/scipy/optimize/_lbfgsb.cpython-310-x86_64-linux-gnu.so(+0xf822)[0x7f3da1454822]
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/scipy/optimize/_lbfgsb.cpython-310-x86_64-linux-gnu.so(+0x10a7f)[0x7f3da1455a7f]
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/scipy/optimize/_lbfgsb.cpython-310-x86_64-linux-gnu.so(+0x4794)[0x7f3da1449794]
python(_PyObject_MakeTpCall+0x26b)[0x560adba10a6b]
python(_PyEval_EvalFrameDefault+0x54a6)[0x560adba0c9d6]
python(_PyFunction_Vectorcall+0x6c)[0x560adba17a2c]
python(PyObject_Call+0xbc)[0x560adba23f1c]
python(_PyEval_EvalFrameDefault+0x2d83)[0x560adba0a2b3]
python(_PyFunction_Vectorcall+0x6c)[0x560adba17a2c]
python(PyVectorcall_Call+0xc5)[0x560adba24295]
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/jobs/flex_refine/flexmod.cpython-310-x86_64-linux-gnu.so(+0x94e30)[0x7f3d87d70e30]
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/run.cpython-310-x86_64-linux-gnu.so(+0xd224)[0x7f3db86a6224]
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/jobs/flex_refine/run_highres.cpython-310-x86_64-linux-gnu.so(+0xc2fe)[0x7f3d9f5b72fe]
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/jobs/flex_refine/run_highres.cpython-310-x86_64-linux-gnu.so(+0x2f717)[0x7f3d9f5da717]
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/run.cpython-310-x86_64-linux-gnu.so(+0x2399a)[0x7f3db86bc99a]
/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/run.cpython-310-x86_64-linux-gnu.so(+0x15581)[0x7f3db86ae581]
python(_PyEval_EvalFrameDefault+0x4c12)[0x560adba0c142]
python(+0x1d7c60)[0x560adbaaac60]
python(PyEval_EvalCode+0x87)[0x560adbaaaba7]
python(+0x20812a)[0x560adbadb12a]
python(+0x203523)[0x560adbad6523]
python(PyRun_StringFlags+0x7d)[0x560adbace91d]
python(PyRun_SimpleStringFlags+0x3c)[0x560adbace75c]
python(Py_RunMain+0x26b)[0x560adbacd66b]
python(Py_BytesMain+0x37)[0x560adba9e1f7]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x7f3db74ff7e5]
python(+0x1cb0f1)[0x560adba9e0f1]
rax 0000000000000001 rbx 00007f397fe2dfb0 rcx 00000000056e7508 rdx 0000000000000000
rsi 00000000056e7508 rdi 00000000056e7507 rbp 00007f35ae387010 rsp 00007fff2e930dd0
r8 00000000056e7508 r9 00007f322db180b0 r10 00000000056e7508 r11 0000000000000001
r12 00007f35ef65f010 r13 00007f37a1ea8290 r14 fffffffffa918af7 r15 00007f322db180b8
0f af d6 66 0f 28 d1 4c 01 f2 0f 1f 00 48 63 7c 8d 00 f2 0f 10 04 cb 48 ff c1 48 01 d7
f2 41 0f 10 74 fd 00 f2 0f 59 f0 f2 41 0f 59 04 fc f2 0f 58 d6 f2 0f 58 c8 4c 39 c1 75
d2 f2 0f 59 cb 99
--> f2 41 0f 11 11 f7 3c 24 f2 43 0f 11 0c d9 49 83 c1 08 8d 42 01 4d 39 f9 75 93 8b
b4 24 a8 00 00 00 4c 8b 74 24 70 44 8b 14 24 8d 04 36 4c 8b 4c 24 08 4c 89 f1 48 8d 94
24 b0 00 00 00 4c 8d bc
========= main process now complete at 2025-07-12 02:47:20.760007.
========= monitor process now complete at 2025-07-12 02:47:20.789438.