We’ve got Ab-Init jobs running for days - but they don’t seem to be doing anything - the output in the logs is only heartbeats, they don’t progress past Iteration 0 - there are some strange warnings in the logs that feel concerning - and then nothing but heartbeats after that….
2026-03-30 10:57:25,193 core run_with_executo INFO | SSD cache complete2026-03-30 10:57:26,301 core heartbeat INFO | ========= Updating heartbeat
2026-03-30 10:57:36,325 core heartbeat INFO | ========= Updating heartbeat
2026-03-30 10:57:46,350 core heartbeat INFO | ========= Updating heartbeat
2026-03-30 10:57:56,374 core heartbeat INFO | ========= Updating heartbeat
2026-03-30 10:58:06,398 core heartbeat INFO | ========= Updating heartbeat
2026-03-30 10:58:16,423 core heartbeat INFO | ========= Updating heartbeat
2026-03-30 10:58:26,447 core heartbeat INFO | ========= Updating heartbeat
WARNING: io_uring support disabled (not supported by kernel), I/O performance may degrade
gpufft: creating new cufft plan (plan id 0 pid 529260)
gpu_id 0
ndims 2
dims 256 256 0
inembed 256 256 0
istride 1
idist 65536
onembed 256 256 0
ostride 1
odist 65536
batch 10
type C2C
wkspc automatic
Python traceback:
HOST ALLOCATION FUNCTION: using numba.cuda.pinned_array
/standard/takcryoem/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/nvrtc.py:257: UserWarning: NVRTC log messages whilst compiling kernel:
kernel(35): warning #68-D: integer conversion resulted in a change of sign
my_nan_count += __shfl_xor_sync(-1, my_nan_count, x);
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
kernel(44): warning #68-D: integer conversion resulted in a change of sign
my_nan_count += __shfl_xor_sync(-1, my_nan_count, x);
^
kernel(17): warning #177-D: variable "N_I" was declared but never referenced
unsigned N_I = gridDim.x;
^
warnings.warn(msg)
/standard/takcryoem/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/threading.py:1075: RuntimeWarning: divide by zero encountered in scalar divide my_nan_count += __shfl_xor_sync(-1, my_nan_count, x); self.run()/standard/takcryoem/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/threading.py:1075: RuntimeWarning: invalid value encountered in scalar divide self.run()/standard/takcryoem/cryosparc/cryosparc_worker/cli/cryosparcw.py:287: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected return run(conf)/standard/takcryoem/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/threading.py:1075: RuntimeWarning: divide by zero encountered in scalar divide self.run()/standard/takcryoem/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/threading.py:1075: RuntimeWarning: invalid value encountered in scalar divide self.run()/standard/takcryoem/cryosparc/cryosparc_worker/cli/cryosparcw.py:287: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected return run(conf)2026-03-30 10:58:36,472 core heartbeat INFO | ========= Updating heartbeatgpufft: creating new cufft plan (plan id 1 pid 529260) gpu_id 0 ndims 2 dims 256 256 0 inembed 256 256 0 istride 1 idist 65536 onembed 256 256 0 ostride 1 odist 65536 batch 90 type C2C wkspc automatic Python traceback:/standard/takcryoem/cryosparc/cryosparc_worker/cli/cryosparcw.py:287: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected return run(conf)2026-03-30 10:58:46,496 core heartbeat INFO | ========= Updating heartbeat2026-03-30 10:58:56,521 core heartbeat INFO | ========= Updating heartbeat2026-03-30 10:59:06,546 core heartbeat INFO | ========= Updating heartbeat