512px local refinement no heartbeat error (v5)

Hi,

Large box local refinements (512px) are killed with a no heartbeat error in v5.03, even though matching NU-refine jobs complete. Switching on low memory mode seems to fix the issue, but previously 512px jobs were fine on this GPU (A6000). Log below:

================= CRYOSPARC =================
Project P4 Job J761
Master ANES002074D Port 39000
===========================================================================
MAIN PROCESS PID 2557195
2026-04-01 11:30:18,993 core                 monitor          INFO   | MONITOR PROCESS PID 2557204
2026-04-01 11:30:18,993 core                 monitor          INFO   | ========= monitor process now waiting for main process
2026-04-01 11:30:18,993 core                 heartbeat        INFO   | ========= Updating heartbeat
================= CRYOSPARC =================
Project P4 Job J761
Master ANES002074D Port 39000
===========================================================================
MAIN PROCESS PID 2557195
========= updating job startup information at 2026-04-01 11:30:19.325803
2026-04-01 11:30:19,359 core.webhook         post_to_webhook  ERROR  | Failed to send webhook notification
Traceback (most recent call last):
  File "/home/user/software/cryosparc/cryosparc_worker/core/webhook.py", line 20, in post_to_webhook
    core.webhook_client.post(core.settings.slack_webhook_url, json=data)
    ^^^^^^^^^^^^^^^^^^^
  File "/home/user/software/cryosparc/cryosparc_worker/core/core.py", line 94, in webhook_client
    assert self.mode == "master", "Cannot access webhook client when running in worker mode"
           ^^^^^^^^^^^^^^^^^^^^^
AssertionError: Cannot access webhook client when running in worker mode
========= now starting main process at 2026-04-01 11:30:19.765712
2026-04-01 11:30:19,870 core                 run              INFO   | Running job J761 of type new_local_refine
2026-04-01 11:30:19,870 core                 run              INFO   | Running job on hostname ANES002074D
2026-04-01 11:30:19,870 core                 run              INFO   | Allocated Resources: lane='default' lane_type='node' hostname='ANES002074D' target=SchedulerTarget(cache_path='/scratch/cryosparc_cache', cache_reserve_mb=10000, cache_quota_mb=None, lane='default', name='ANES002074D', title='Worker ANES002074D', desc=None, hostname='ANES002074D', worker_bin_path='/home/user/software/cryosparc/cryosparc_worker/bin/cryosparcw', config=Node(type='node', ssh_str='user@ANES002074D', resource_slots=ResourceSlots(CPU=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], GPU=[0, 1, 2, 3], RAM=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]), resource_fixed=FixedResourceSlots(SSD=True), monitor_port=None, gpus=[Gpu(id=0, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=1, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=2, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=3, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640)])) slots=ResourceSlots(CPU=[0, 1, 2, 3], GPU=[0], RAM=[0, 1, 2]) fixed=FixedResourceSlots(SSD=True) licenses_acquired=1
Transparent hugepages setting: always [madvise] never

2026-04-01 11:30:24,130 core.webhook         post_to_webhook  ERROR  | Failed to send webhook notification
Traceback (most recent call last):
  File "/home/user/software/cryosparc/cryosparc_worker/core/webhook.py", line 20, in post_to_webhook
    core.webhook_client.post(core.settings.slack_webhook_url, json=data)
    ^^^^^^^^^^^^^^^^^^^
  File "/home/user/software/cryosparc/cryosparc_worker/core/core.py", line 94, in webhook_client
    assert self.mode == "master", "Cannot access webhook client when running in worker mode"
           ^^^^^^^^^^^^^^^^^^^^^
AssertionError: Cannot access webhook client when running in worker mode
2026-04-01 11:30:29,327 core                 heartbeat        INFO   | ========= Updating heartbeat
Running job  J761  of type  new_local_refine
Running job on hostname %s ANES002074D
Allocated Resources :  lane='default' lane_type='node' hostname='ANES002074D' target=SchedulerTarget(cache_path='/scratch/cryosparc_cache', cache_reserve_mb=10000, cache_quota_mb=None, lane='default', name='ANES002074D', title='Worker ANES002074D', desc=None, hostname='ANES002074D', worker_bin_path='/home/user/software/cryosparc/cryosparc_worker/bin/cryosparcw', config=Node(type='node', ssh_str='user@ANES002074D', resource_slots=ResourceSlots(CPU=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], GPU=[0, 1, 2, 3], RAM=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]), resource_fixed=FixedResourceSlots(SSD=True), monitor_port=None, gpus=[Gpu(id=0, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=1, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=2, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=3, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640)])) slots=ResourceSlots(CPU=[0, 1, 2, 3], GPU=[0], RAM=[0, 1, 2]) fixed=FixedResourceSlots(SSD=True) licenses_acquired=1
2026-04-01 11:30:38,895 core                 run_with_executo INFO   | Resolving 9295 source path(s) for caching
2026-04-01 11:30:39,380 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:30:40,595 core                 run_with_executo INFO   | Resolved 9295 sources in 1.70 seconds
2026-04-01 11:30:40,621 core                 lock_job_run     INFO   | Lock ssd_cache acquired by P4-J761-1775057418
2026-04-01 11:30:40,622 core                 allocate         INFO   | Cache allocation start. Active run IDs: P4-J761-1775057418, P3-J279-1772686457
2026-04-01 11:30:41,337 core                 refresh          INFO   | Refreshed cache drive in 0.72 seconds
2026-04-01 11:30:41,432 core                 cleanup_junk_fil INFO   | Removed 1 invalid item(s) in the cache
2026-04-01 11:30:41,961 core                 refresh          INFO   | Refreshed cache drive in 0.53 seconds
2026-04-01 11:30:41,980 core                 allocate         INFO   | Deleted 0 cached files, encountered 0 errors
2026-04-01 11:30:41,981 core                 allocate         INFO   | Allocated 0 stub cache files; creating links
2026-04-01 11:30:42,142 core                 allocate         INFO   | Cache allocation complete
2026-04-01 11:30:42,142 core                 unlock_job_run   INFO   | Releasing lock ssd_cache from P4-J761-1775057418
2026-04-01 11:30:42,143 core                 run_with_executo INFO   | Cache allocation ran in 1.52 seconds
2026-04-01 11:30:42,143 core                 run_with_executo INFO   | Found 9295 SSD hit(s)
2026-04-01 11:30:42,143 core                 run_with_executo INFO   | Requested files successfully cached to SSD
2026-04-01 11:30:42,475 core                 run_with_executo INFO   | SSD cache complete
2026-04-01 11:30:49,386 core                 heartbeat        INFO   | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 0   pid 2557195) 
	gpu_id  0 
	ndims   2 
	dims    512 512 0 
	inembed 512 514 0 
	istride 1 
	idist   263168 
	onembed 512 257 0 
	ostride 1 
	odist   131584 
	batch   500 
	type    R2C 
	wkspc   automatic 
	Python traceback:

2026-04-01 11:30:59,393 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:31:09,399 core                 heartbeat        INFO   | ========= Updating heartbeat
/home/user/software/cryosparc/cryosparc_worker/compute/plotutil.py:749: RuntimeWarning: divide by zero encountered in log
  logabs = n.log(n.abs(fM))
2026-04-01 11:31:19,406 core                 heartbeat        INFO   | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 1   pid 2557195) 
	gpu_id  0 
	ndims   3 
	dims    512 512 512 
	inembed 512 512 514 
	istride 1 
	idist   134742016 
	onembed 512 512 257 
	ostride 1 
	odist   67371008 
	batch   1 
	type    R2C 
	wkspc   automatic 
	Python traceback:

gpufft: creating new cufft plan (plan id 2   pid 2557195) 
	gpu_id  0 
	ndims   2 
	dims    512 512 0 
	inembed 512 514 0 
	istride 1 
	idist   263168 
	onembed 512 257 0 
	ostride 1 
	odist   131584 
	batch   500 
	type    R2C 
	wkspc   automatic 
	Python traceback:

/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/nvrtc.py:257: UserWarning: NVRTC log messages whilst compiling kernel:

kernel(35): warning #68-D: integer conversion resulted in a change of sign
                  my_nan_count += __shfl_xor_sync(-1, my_nan_count, x);
                                                  ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

kernel(44): warning #68-D: integer conversion resulted in a change of sign
                      my_nan_count += __shfl_xor_sync(-1, my_nan_count, x);
                                                      ^

kernel(17): warning #177-D: variable "N_I" was declared but never referenced
              unsigned N_I = gridDim.x;
                       ^


  warnings.warn(msg)
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 125 will likely result in GPU under-utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 21 will likely result in GPU under-utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 1 will likely result in GPU under-utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
2026-04-01 11:31:29,412 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:31:39,419 core                 heartbeat        INFO   | ========= Updating heartbeat
/home/user/software/cryosparc/cryosparc_worker/cli/cryosparcw.py:290: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected
  return run(conf)
2026-04-01 11:31:49,425 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:31:59,433 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:32:09,440 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:32:19,447 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:32:29,453 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:32:39,460 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:32:49,466 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:32:59,473 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:33:09,480 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:33:19,486 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:33:29,492 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:33:39,499 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:33:49,505 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:33:59,512 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:34:09,518 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:34:19,525 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:34:29,531 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:34:39,536 core                 heartbeat        INFO   | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 3   pid 2557195) 
	gpu_id  0 
	ndims   2 
	dims    512 512 0 
	inembed 512 514 0 
	istride 1 
	idist   263168 
	onembed 512 257 0 
	ostride 1 
	odist   131584 
	batch   364 
	type    R2C 
	wkspc   automatic 
	Python traceback:

2026-04-01 11:34:49,543 core                 heartbeat        INFO   | ========= Updating heartbeat
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 91 will likely result in GPU under-utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 16 will likely result in GPU under-utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
/home/user/software/cryosparc/cryosparc_worker/cli/cryosparcw.py:290: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected
  return run(conf)
2026-04-01 11:34:59,549 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:35:09,557 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:35:19,563 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:35:29,570 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:35:39,576 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:35:49,583 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:35:59,589 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:36:09,596 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:36:19,602 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:36:29,609 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:36:39,616 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:36:49,622 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:36:59,629 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:37:09,635 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:37:19,642 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:37:29,649 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:37:39,655 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:37:49,662 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:37:59,668 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:38:09,675 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:38:19,681 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:38:29,688 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:38:39,695 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:38:49,702 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:38:59,708 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:39:09,715 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:39:19,721 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:39:29,728 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:39:39,735 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:39:49,741 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:39:59,748 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:40:09,754 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:40:19,761 core                 heartbeat        INFO   | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 4   pid 2557195) 
	gpu_id  0 
	ndims   2 
	dims    512 512 0 
	inembed 512 514 0 
	istride 1 
	idist   263168 
	onembed 512 257 0 
	ostride 1 
	odist   131584 
	batch   411 
	type    R2C 
	wkspc   automatic 
	Python traceback:

/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 103 will likely result in GPU under-utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 18 will likely result in GPU under-utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
2026-04-01 11:40:29,767 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:40:39,774 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:40:49,780 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:40:59,788 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:41:09,795 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:41:19,802 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:41:29,808 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:41:39,816 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:41:49,863 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:41:59,870 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:42:09,876 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:42:19,883 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:42:29,890 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:42:39,896 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:42:49,903 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:42:59,910 core                 heartbeat        INFO   | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 5   pid 2557195) 
	gpu_id  0 
	ndims   2 
	dims    512 512 0 
	inembed 512 514 0 
	istride 1 
	idist   263168 
	onembed 512 257 0 
	ostride 1 
	odist   131584 
	batch   412 
	type    R2C 
	wkspc   automatic 
	Python traceback:

2026-04-01 11:43:09,916 core                 heartbeat        INFO   | ========= Updating heartbeat
2026-04-01 11:43:19,923 core                 heartbeat        INFO   | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 6   pid 2557195) 
	gpu_id  0 
	ndims   3 
	dims    512 512 512 
	inembed 512 512 514 
	istride 1 
	idist   134742016 
	onembed 512 512 257 
	ostride 1 
	odist   67371008 
	batch   1 
	type    R2C 
	wkspc   manual 
	Python traceback:

/home/user/software/cryosparc/cryosparc_worker/cli/cryosparcw.py:290: RuntimeWarning: invalid value encountered in divide
  return run(conf)
2026-04-01 11:43:29,930 core                 heartbeat        INFO   | ========= Updating heartbeat

@olibclarke Please can you also post the final events of the job:

cryosparcm job events P4 J761 | tail -n 30

Here you go:

user@ANES002074D:~$ cryosparcm job events P4 J761 | tail -n 30
                      [asset file="J761_initial_real_space_slices.pdf" id="69cd3a40aa42b8b68cf76e2f"]
[2026-04-01 15:31:13] [FIGURE] Initial Fourier Space Slices
                      [asset file="J761_initial_fourier_space_slices.png" id="69cd3a41aa42b8b68cf76e32"]
                      [asset file="J761_initial_fourier_space_slices.pdf" id="69cd3a41aa42b8b68cf76e34"]
[2026-04-01 15:31:13] [7984 MB] ====== Starting Refinement Iterations ======
[2026-04-01 15:31:13] [7984 MB] ----------------------------- Start Iteration 0
[2026-04-01 15:31:13] [7984 MB]   Using Max Alignment Radius 150.000 (4.160A)
[2026-04-01 15:31:13] [7984 MB]   Using Full Dataset (split 89728 in A, 89823 in B)
[2026-04-01 15:31:19] [8312 MB]   Current alpha values  (  0.31 |  0.87 |  1.00 |  1.13 |  1.72 )
[2026-04-01 15:31:19] [8312 MB] -- THR 1 BATCH 500 NUM 22500 TOTAL 150.66515 ELAPSED 712.58241 --
[2026-04-01 15:31:20] [FIGURE] Alignment map A
                      [asset file="J761_alignment_map_a.png" id="69cd3a47aa42b8b68cf76e3d"]
                      [asset file="J761_alignment_map_a.pdf" id="69cd3a48aa42b8b68cf76e3f"]
[2026-04-01 15:31:20] [FIGURE] Alignment map B
                      [asset file="J761_alignment_map_b.png" id="69cd3a48aa42b8b68cf76e42"]
                      [asset file="J761_alignment_map_b.pdf" id="69cd3a48aa42b8b68cf76e44"]
[2026-04-01 15:31:20] [8343 MB]   Initializing noise model... (2/2)
[2026-04-01 15:32:05] [FIGURE] Noise Model Initialization (2/2)
                      [asset file="J761_noise_model_initialization_22.png" id="69cd3a75aa42b8b68cf76e48"]
                      [asset file="J761_noise_model_initialization_22.pdf" id="69cd3a75aa42b8b68cf76e4a"]
[2026-04-01 15:43:14] [14630 MB]   Processed 179551.000 images in 714.794s.
[2026-04-01 15:43:20] [15714 MB]   Computing FSCs...
[2026-04-01 15:43:22] [15715 MB]     Done in 1.650s
[2026-04-01 15:43:22] [15715 MB]   Computing cFSCs...
[2026-04-01 15:43:26] [15720 MB]     Done in 4.369s
[2026-04-01 15:43:26] [15720 MB]   Using Filter Radius 194.273 (3.212A) | Previous: 104.000 (6.000A)
[2026-04-01 15:43:36] [19820 MB]   Non-uniform regularization with compute option: GPU
[2026-04-01 15:43:36] [19820 MB]   Running local cross validation for A ...
[2026-04-01 15:49:37] [145 MB] **** Kill signal sent by unknown user ****
[2026-04-01 15:49:37] [146 MB] Job is unresponsive - no heartbeat received in 180 seconds

The worker host (instead of the GPU) may have run out of RAM, which could happen if the total system RAM is small, or the system was simultaneously processing other workloads.
How much RAM does the worker have? Was anything else running on that same worker when the job failed?

It is a standalone GPU workstation - we have 512G RAM. Looking at it we had two Bindcraft jobs running, which are still running, and with those plus a low-memory mode local refinement we have 291GB free..

Please can you post the output of the command

sudo journalctl | grep 2557195

on that workstation.
Let’s see whether the system log has any information on the failed job’s main process.

It

That doesn’t generate any output

Thanks @olibclarke Does

/home/user/software/cryosparc/cryosparc_worker/config.sh

already contain the line

export CRYOSPARC_NO_PAGELOCK=true

?
If it doesn’t, please can you try running a clone of the failed job with that setting.

1 Like

It did not - trying now, will let you know the outcome. Do I need to restart CS for that setting to take effect?

A restart is not needed for changes in the worker config.sh.

2 Likes

Thought so - thanks for confirming! Will post once job completes or dies

Hi @wtempel , this seemed to fix the issue - thanks!

4 Likes