Hi,
Large box local refinements (512px) are killed with a no heartbeat error in v5.03, even though matching NU-refine jobs complete. Switching on low memory mode seems to fix the issue, but previously 512px jobs were fine on this GPU (A6000). Log below:
================= CRYOSPARC =================
Project P4 Job J761
Master ANES002074D Port 39000
===========================================================================
MAIN PROCESS PID 2557195
2026-04-01 11:30:18,993 core monitor INFO | MONITOR PROCESS PID 2557204
2026-04-01 11:30:18,993 core monitor INFO | ========= monitor process now waiting for main process
2026-04-01 11:30:18,993 core heartbeat INFO | ========= Updating heartbeat
================= CRYOSPARC =================
Project P4 Job J761
Master ANES002074D Port 39000
===========================================================================
MAIN PROCESS PID 2557195
========= updating job startup information at 2026-04-01 11:30:19.325803
2026-04-01 11:30:19,359 core.webhook post_to_webhook ERROR | Failed to send webhook notification
Traceback (most recent call last):
File "/home/user/software/cryosparc/cryosparc_worker/core/webhook.py", line 20, in post_to_webhook
core.webhook_client.post(core.settings.slack_webhook_url, json=data)
^^^^^^^^^^^^^^^^^^^
File "/home/user/software/cryosparc/cryosparc_worker/core/core.py", line 94, in webhook_client
assert self.mode == "master", "Cannot access webhook client when running in worker mode"
^^^^^^^^^^^^^^^^^^^^^
AssertionError: Cannot access webhook client when running in worker mode
========= now starting main process at 2026-04-01 11:30:19.765712
2026-04-01 11:30:19,870 core run INFO | Running job J761 of type new_local_refine
2026-04-01 11:30:19,870 core run INFO | Running job on hostname ANES002074D
2026-04-01 11:30:19,870 core run INFO | Allocated Resources: lane='default' lane_type='node' hostname='ANES002074D' target=SchedulerTarget(cache_path='/scratch/cryosparc_cache', cache_reserve_mb=10000, cache_quota_mb=None, lane='default', name='ANES002074D', title='Worker ANES002074D', desc=None, hostname='ANES002074D', worker_bin_path='/home/user/software/cryosparc/cryosparc_worker/bin/cryosparcw', config=Node(type='node', ssh_str='user@ANES002074D', resource_slots=ResourceSlots(CPU=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], GPU=[0, 1, 2, 3], RAM=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]), resource_fixed=FixedResourceSlots(SSD=True), monitor_port=None, gpus=[Gpu(id=0, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=1, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=2, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=3, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640)])) slots=ResourceSlots(CPU=[0, 1, 2, 3], GPU=[0], RAM=[0, 1, 2]) fixed=FixedResourceSlots(SSD=True) licenses_acquired=1
Transparent hugepages setting: always [madvise] never
2026-04-01 11:30:24,130 core.webhook post_to_webhook ERROR | Failed to send webhook notification
Traceback (most recent call last):
File "/home/user/software/cryosparc/cryosparc_worker/core/webhook.py", line 20, in post_to_webhook
core.webhook_client.post(core.settings.slack_webhook_url, json=data)
^^^^^^^^^^^^^^^^^^^
File "/home/user/software/cryosparc/cryosparc_worker/core/core.py", line 94, in webhook_client
assert self.mode == "master", "Cannot access webhook client when running in worker mode"
^^^^^^^^^^^^^^^^^^^^^
AssertionError: Cannot access webhook client when running in worker mode
2026-04-01 11:30:29,327 core heartbeat INFO | ========= Updating heartbeat
Running job J761 of type new_local_refine
Running job on hostname %s ANES002074D
Allocated Resources : lane='default' lane_type='node' hostname='ANES002074D' target=SchedulerTarget(cache_path='/scratch/cryosparc_cache', cache_reserve_mb=10000, cache_quota_mb=None, lane='default', name='ANES002074D', title='Worker ANES002074D', desc=None, hostname='ANES002074D', worker_bin_path='/home/user/software/cryosparc/cryosparc_worker/bin/cryosparcw', config=Node(type='node', ssh_str='user@ANES002074D', resource_slots=ResourceSlots(CPU=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], GPU=[0, 1, 2, 3], RAM=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]), resource_fixed=FixedResourceSlots(SSD=True), monitor_port=None, gpus=[Gpu(id=0, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=1, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=2, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640), Gpu(id=3, name='NVIDIA RTX 6000 Ada Generation', mem=51527024640)])) slots=ResourceSlots(CPU=[0, 1, 2, 3], GPU=[0], RAM=[0, 1, 2]) fixed=FixedResourceSlots(SSD=True) licenses_acquired=1
2026-04-01 11:30:38,895 core run_with_executo INFO | Resolving 9295 source path(s) for caching
2026-04-01 11:30:39,380 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:30:40,595 core run_with_executo INFO | Resolved 9295 sources in 1.70 seconds
2026-04-01 11:30:40,621 core lock_job_run INFO | Lock ssd_cache acquired by P4-J761-1775057418
2026-04-01 11:30:40,622 core allocate INFO | Cache allocation start. Active run IDs: P4-J761-1775057418, P3-J279-1772686457
2026-04-01 11:30:41,337 core refresh INFO | Refreshed cache drive in 0.72 seconds
2026-04-01 11:30:41,432 core cleanup_junk_fil INFO | Removed 1 invalid item(s) in the cache
2026-04-01 11:30:41,961 core refresh INFO | Refreshed cache drive in 0.53 seconds
2026-04-01 11:30:41,980 core allocate INFO | Deleted 0 cached files, encountered 0 errors
2026-04-01 11:30:41,981 core allocate INFO | Allocated 0 stub cache files; creating links
2026-04-01 11:30:42,142 core allocate INFO | Cache allocation complete
2026-04-01 11:30:42,142 core unlock_job_run INFO | Releasing lock ssd_cache from P4-J761-1775057418
2026-04-01 11:30:42,143 core run_with_executo INFO | Cache allocation ran in 1.52 seconds
2026-04-01 11:30:42,143 core run_with_executo INFO | Found 9295 SSD hit(s)
2026-04-01 11:30:42,143 core run_with_executo INFO | Requested files successfully cached to SSD
2026-04-01 11:30:42,475 core run_with_executo INFO | SSD cache complete
2026-04-01 11:30:49,386 core heartbeat INFO | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 0 pid 2557195)
gpu_id 0
ndims 2
dims 512 512 0
inembed 512 514 0
istride 1
idist 263168
onembed 512 257 0
ostride 1
odist 131584
batch 500
type R2C
wkspc automatic
Python traceback:
2026-04-01 11:30:59,393 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:31:09,399 core heartbeat INFO | ========= Updating heartbeat
/home/user/software/cryosparc/cryosparc_worker/compute/plotutil.py:749: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
2026-04-01 11:31:19,406 core heartbeat INFO | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 1 pid 2557195)
gpu_id 0
ndims 3
dims 512 512 512
inembed 512 512 514
istride 1
idist 134742016
onembed 512 512 257
ostride 1
odist 67371008
batch 1
type R2C
wkspc automatic
Python traceback:
gpufft: creating new cufft plan (plan id 2 pid 2557195)
gpu_id 0
ndims 2
dims 512 512 0
inembed 512 514 0
istride 1
idist 263168
onembed 512 257 0
ostride 1
odist 131584
batch 500
type R2C
wkspc automatic
Python traceback:
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/cudadrv/nvrtc.py:257: UserWarning: NVRTC log messages whilst compiling kernel:
kernel(35): warning #68-D: integer conversion resulted in a change of sign
my_nan_count += __shfl_xor_sync(-1, my_nan_count, x);
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
kernel(44): warning #68-D: integer conversion resulted in a change of sign
my_nan_count += __shfl_xor_sync(-1, my_nan_count, x);
^
kernel(17): warning #177-D: variable "N_I" was declared but never referenced
unsigned N_I = gridDim.x;
^
warnings.warn(msg)
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 125 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 21 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 1 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
2026-04-01 11:31:29,412 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:31:39,419 core heartbeat INFO | ========= Updating heartbeat
/home/user/software/cryosparc/cryosparc_worker/cli/cryosparcw.py:290: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected
return run(conf)
2026-04-01 11:31:49,425 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:31:59,433 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:32:09,440 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:32:19,447 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:32:29,453 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:32:39,460 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:32:49,466 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:32:59,473 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:33:09,480 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:33:19,486 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:33:29,492 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:33:39,499 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:33:49,505 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:33:59,512 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:34:09,518 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:34:19,525 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:34:29,531 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:34:39,536 core heartbeat INFO | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 3 pid 2557195)
gpu_id 0
ndims 2
dims 512 512 0
inembed 512 514 0
istride 1
idist 263168
onembed 512 257 0
ostride 1
odist 131584
batch 364
type R2C
wkspc automatic
Python traceback:
2026-04-01 11:34:49,543 core heartbeat INFO | ========= Updating heartbeat
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 91 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 16 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
/home/user/software/cryosparc/cryosparc_worker/cli/cryosparcw.py:290: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected
return run(conf)
2026-04-01 11:34:59,549 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:35:09,557 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:35:19,563 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:35:29,570 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:35:39,576 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:35:49,583 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:35:59,589 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:36:09,596 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:36:19,602 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:36:29,609 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:36:39,616 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:36:49,622 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:36:59,629 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:37:09,635 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:37:19,642 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:37:29,649 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:37:39,655 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:37:49,662 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:37:59,668 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:38:09,675 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:38:19,681 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:38:29,688 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:38:39,695 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:38:49,702 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:38:59,708 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:39:09,715 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:39:19,721 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:39:29,728 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:39:39,735 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:39:49,741 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:39:59,748 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:40:09,754 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:40:19,761 core heartbeat INFO | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 4 pid 2557195)
gpu_id 0
ndims 2
dims 512 512 0
inembed 512 514 0
istride 1
idist 263168
onembed 512 257 0
ostride 1
odist 131584
batch 411
type R2C
wkspc automatic
Python traceback:
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 103 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
/home/user/software/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 18 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
2026-04-01 11:40:29,767 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:40:39,774 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:40:49,780 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:40:59,788 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:41:09,795 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:41:19,802 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:41:29,808 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:41:39,816 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:41:49,863 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:41:59,870 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:42:09,876 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:42:19,883 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:42:29,890 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:42:39,896 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:42:49,903 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:42:59,910 core heartbeat INFO | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 5 pid 2557195)
gpu_id 0
ndims 2
dims 512 512 0
inembed 512 514 0
istride 1
idist 263168
onembed 512 257 0
ostride 1
odist 131584
batch 412
type R2C
wkspc automatic
Python traceback:
2026-04-01 11:43:09,916 core heartbeat INFO | ========= Updating heartbeat
2026-04-01 11:43:19,923 core heartbeat INFO | ========= Updating heartbeat
gpufft: creating new cufft plan (plan id 6 pid 2557195)
gpu_id 0
ndims 3
dims 512 512 512
inembed 512 512 514
istride 1
idist 134742016
onembed 512 512 257
ostride 1
odist 67371008
batch 1
type R2C
wkspc manual
Python traceback:
/home/user/software/cryosparc/cryosparc_worker/cli/cryosparcw.py:290: RuntimeWarning: invalid value encountered in divide
return run(conf)
2026-04-01 11:43:29,930 core heartbeat INFO | ========= Updating heartbeat