Hi, I’m running an Ab initio job and it failed in 2 mins, with error below:
[2023-11-19 23:33:13.47]
[CPU: 987.9 MB Avail: 480.80 GB]
**----------- Iteration 99 (epoch 0.043). radwn 10.00 resolution 27.65A minisize 90 beta 0.10 **
[2023-11-19 23:33:14.03]
[CPU: 987.9 MB Avail: 480.81 GB]
** – Class 0 – lr: 0.40 eps: 10.02 step ratio : 0.0438 ESS R: 6566.561 S: 9.286 Class Size: 23.7% (Average: 29.2%)**
[2023-11-19 23:33:14.05]
[CPU: 987.9 MB Avail: 480.81 GB]
** – Class 1 – lr: 0.40 eps: 10.02 step ratio : 0.0783 ESS R: 8538.329 S: 12.072 Class Size: 24.6% (Average: 20.3%)**
[2023-11-19 23:33:14.06]
[CPU: 987.9 MB Avail: 480.81 GB]
** – Class 2 – lr: 0.40 eps: 10.02 step ratio : 0.2041 ESS R: 8922.562 S: 12.951 Class Size: 25.7% (Average: 29.2%)**
[2023-11-19 23:33:14.07]
[CPU: 987.9 MB Avail: 480.81 GB]
** – Class 3 – lr: 0.40 eps: 10.02 step ratio : inf ESS R: 8925.131 S: 12.998 Class Size: 26.0% (Average: 21.4%)**
[2023-11-19 23:33:14.13]
[CPU: 987.9 MB Avail: 480.81 GB]
** Done iteration 00099 of 01520 in 0.672s. Total time 98.5s.**
[2023-11-19 23:33:16.37]
[CPU: 897.2 MB Avail: 480.90 GB]
Traceback (most recent call last):
** File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_master.cryosparc_compute.run.main**
** File “cryosparc_master/cryosparc_compute/jobs/abinit/run.py”, line 230, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit**
** File “/app/cryosparc4/cryosparc_worker/cryosparc_compute/sigproc.py”, line 464, in align_density**
** assert n.all(n.isfinite(mu))**
AssertionError
It happened between the checkpoint1 and 2, then I increase the mini batch size to 300 and rerun it, after 6 mins there’s another error:
[2023-11-19 22:50:26.60]
[CPU: 1.07 GB Avail: 476.20 GB]
**----------- Iteration 151 (epoch 0.219). radwn 10.00 resolution 27.65A minisize 300 beta 0.10 **
[2023-11-19 22:50:28.26]
[CPU: 980.1 MB Avail: 476.23 GB]
Traceback (most recent call last):
** File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_master.cryosparc_compute.run.main**
** File “cryosparc_master/cryosparc_compute/jobs/abinit/run.py”, line 313, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit**
** File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 1183, in cryosparc_master.cryosparc_compute.engine.engine.process**
** File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 1184, in cryosparc_master.cryosparc_compute.engine.engine.process**
** File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 1142, in cryosparc_master.cryosparc_compute.engine.engine.process.work**
** File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 432, in cryosparc_master.cryosparc_compute.engine.engine.EngineThread.find_and_set_best_pose_shift**
** File “<array_function internals>”, line 5, in unravel_index**
ValueError: index -1036042677 is out of bounds for array with size 116025
This time it passes the checkpoint 1 but still failed. I have already restarted CS and every job in this workspace is under the version 4.4.0.
Hope to have advice on this!
Best,
Tom