Patch CTF - no heartbeat received error

Hi,

I have tried running the Patch CTF on only 828 imported movies, it runs for 1h every single time and job is killed after. I have restarted the job a couple of times and even did a cryosparcm restart and it didn’t work.

image

Below are the info. for the log

Log:
================= CRYOSPARCW ======= 2024-04-19 07:06:54.528237 =========
Project P8 Job J2
Master bun081 Port 31298

========= monitor process now starting main process at 2024-04-19 07:06:54.528311
MAINPROCESS PID 2262182
MAIN PID 2262182
motioncorrection.run_patch cryosparc_compute.jobs.jobregister
========= monitor process now waiting for main process


Running job on hostname %s A100.10gb.MIG
Allocated Resources : {‘fixed’: {‘SSD’: False}, ‘hostname’: ‘A100.10gb.MIG’, ‘lane’: ‘A100.10gb.MIG’, ‘lane_type’: ‘cluster’, ‘license’: True, ‘licenses_acquired’: 1, ‘slots’: {‘CPU’: [0, 1, 2, 3, 4, 5], ‘GPU’: [0], ‘RAM’: [0, 1]}, ‘target’: {‘cache_path’: ‘/scratch/user/s4483594/cryosparc/ssd’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: , ‘custom_vars’: {}, ‘desc’: None, ‘hostname’: ‘A100.10gb.MIG’, ‘lane’: ‘A100.10gb.MIG’, ‘name’: ‘A100.10gb.MIG’, ‘qdel_cmd_tpl’: ‘scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: ‘sinfo’, ‘qstat_cmd_tpl’: ‘squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: ‘squeue -j {{ cluster_job_id }} --format=%T -h’, ‘qsub_cmd_tpl’: ‘sbatch {{ script_path_abs }}’, ‘script_tpl’: ‘#!/usr/bin/env bash\n#### cryoSPARC cluster submission script template for SLURM\n## Available variables:\n## {{ run_cmd }} - the complete command string to run the job\n## {{ num_cpu }} - the number of CPUs needed\n## {{ num_gpu }} - the number of GPUs needed. \n## Note: the code will use this many GPUs starting from dev id 0\n## the cluster scheduler or this script have the responsibility\n## of setting CUDA_VISIBLE_DEVICES so that the job code ends up\n## using the correct cluster-allocated GPUs.\n## {{ ram_gb }} - the amount of RAM needed in GB\n## {{ job_dir_abs }} - absolute path to the job directory\n## {{ project_dir_abs }} - absolute path to the project dir\n## {{ job_log_path_abs }} - absolute path to the log file for the job\n## {{ worker_bin_path }} - absolute path to the cryosparc worker command\n## {{ run_args }} - arguments to be passed to cryosparcw run\n## {{ project_uid }} - uid of the project\n## {{ job_uid }} - uid of the job\n## {{ job_creator }} - name of the user that created the job (may contain spaces)\n## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email)\n##\n## What follows is a simple SLURM script:\n\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH -n 1\n#SBATCH -c {{ num_cpu }}\n#SBATCH --account=a_landsberg\n#SBATCH --gres=gpu:nvidia_a100_80gb_pcie_1g.10gb:{{ num_gpu }}\n#SBATCH --partition=gpu_cuda\n#SBATCH --mem={{ (ram_gb*1000)|int }}MB\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n\n{{ run_cmd }}\n\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘A100.10gb.MIG’, ‘tpl_vars’: [‘run_args’, ‘project_uid’, ‘num_cpu’, ‘job_dir_abs’, ‘project_dir_abs’, ‘worker_bin_path’, ‘run_cmd’, ‘ram_gb’, ‘job_uid’, ‘cryosparc_username’, ‘command’, ‘cluster_job_id’, ‘num_gpu’, ‘job_log_path_abs’, ‘job_creator’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/home/s4483594/cryosparc/cryosparc_worker/bin/cryosparcw’}}
/home/s4483594/cryosparc/cryosparc_worker/cryosparc_compute/jobs/motioncorrection/mic_utils.py:95: NumbaDeprecationWarning: The ‘nopython’ keyword argument was not supplied to the ‘numba.jit’ decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See Deprecation Notices — Numba 0+untagged.2155.g9ce83ef.dirty documentation for details.
@jit(nogil=True)
/home/s4483594/cryosparc/cryosparc_worker/cryosparc_compute/micrographs.py:563: NumbaDeprecationWarning: The ‘nopython’ keyword argument was not supplied to the ‘numba.jit’ decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See Deprecation Notices — Numba 0+untagged.2155.g9ce83ef.dirty documentation for details.
def contrast_normalization(arr_bin, tile_size = 128):
gpufft: creating new cufft plan (plan id 0 pid 2262211)
gpu_id 0
ndims 2
dims 1152 1152 0
inembed 1152 1154 0
istride 1
idist 1329408
onembed 1152 577 0
ostride 1
odist 664704
batch 35
type R2C
wkspc automatic
Python traceback:

gpufft: creating new cufft plan (plan id 1 pid 2262211)
gpu_id 0
ndims 2
dims 5832 5832 0
inembed 5832 5834 0
istride 1
idist 34023888
onembed 5832 2917 0
ostride 1
odist 17011944
batch 1
type R2C
wkspc manual
Python traceback:

gpufft: creating new cufft plan (plan id 2 pid 2262211)
gpu_id 0
ndims 2
dims 11664 11664 0
inembed 11664 5833 0
istride 1
idist 68036112
onembed 11664 11666 0
ostride 1
odist 136072224
batch 1
type C2R
wkspc manual
Python traceback:

gpufft: creating new cufft plan (plan id 3 pid 2262211)
gpu_id 0
ndims 2
dims 11664 11664 0
inembed 11664 11666 0
istride 1
idist 136072224
onembed 11664 5833 0
ostride 1
odist 68036112
batch 1
type R2C
wkspc manual
Python traceback:

========= sending heartbeat at 2024-04-19 07:07:10.093618
gpufft: creating new cufft plan (plan id 4 pid 2262211)
gpu_id 0
ndims 2
dims 5832 5832 0
inembed 5832 2917 0
istride 1
idist 17011944
onembed 5832 5834 0
ostride 1
odist 34023888
batch 1
type C2R
wkspc manual
Python traceback:

========= sending heartbeat at 2024-04-19 07:07:20.116430
========= sending heartbeat at 2024-04-19 07:07:30.137995
========= sending heartbeat at 2024-04-19 07:07:40.156659
========= sending heartbeat at 2024-04-19 07:07:50.176652
========= sending heartbeat at 2024-04-19 07:08:00.224975
========= sending heartbeat at 2024-04-19 07:08:10.244952
========= sending heartbeat at 2024-04-19 07:08:20.265805
========= sending heartbeat at 2024-04-19 07:08:30.284138
========= sending heartbeat at 2024-04-19 07:08:40.302799
========= sending heartbeat at 2024-04-19 07:08:50.319592
========= sending heartbeat at 2024-04-19 07:09:00.338062
========= sending heartbeat at 2024-04-19 07:09:10.362053
========= sending heartbeat at 2024-04-19 07:09:20.383968
========= sending heartbeat at 2024-04-19 07:09:30.403990
========= sending heartbeat at 2024-04-19 07:09:40.423508
========= sending heartbeat at 2024-04-19 07:09:50.437455
========= sending heartbeat at 2024-04-19 07:10:00.457067
========= sending heartbeat at 2024-04-19 07:10:10.477646
========= sending heartbeat at 2024-04-19 07:10:20.498671
========= sending heartbeat at 2024-04-19 07:10:30.519111
========= sending heartbeat at 2024-04-19 07:10:40.538768
========= sending heartbeat at 2024-04-19 07:11:34.334020
========= sending heartbeat at 2024-04-19 07:11:44.355895
========= sending heartbeat at 2024-04-19 07:11:54.376335
========= sending heartbeat at 2024-04-19 07:12:04.396667
========= sending heartbeat at 2024-04-19 07:12:14.450976
========= sending heartbeat at 2024-04-19 07:12:24.471107
========= sending heartbeat at 2024-04-19 07:12:34.485342
========= sending heartbeat at 2024-04-19 07:12:44.549971
========= sending heartbeat at 2024-04-19 07:12:54.569405
========= sending heartbeat at 2024-04-19 07:13:04.587143
========= sending heartbeat at 2024-04-19 07:13:14.604669
========= sending heartbeat at 2024-04-19 07:13:24.624521
========= sending heartbeat at 2024-04-19 07:13:34.647319
========= sending heartbeat at 2024-04-19 07:13:44.664164
========= sending heartbeat at 2024-04-19 07:13:54.684438
========= sending heartbeat at 2024-04-19 07:14:04.704927
========= sending heartbeat at 2024-04-19 07:14:14.724334
========= sending heartbeat at 2024-04-19 07:14:24.743971
========= sending heartbeat at 2024-04-19 07:14:34.766749
========= sending heartbeat at 2024-04-19 07:14:44.788676
========= sending heartbeat at 2024-04-19 07:14:54.843028
========= sending heartbeat at 2024-04-19 07:15:04.862893
========= sending heartbeat at 2024-04-19 07:15:14.882272
========= sending heartbeat at 2024-04-19 07:15:24.901359
========= sending heartbeat at 2024-04-19 07:15:34.920924
========= sending heartbeat at 2024-04-19 07:15:44.940978
========= sending heartbeat at 2024-04-19 07:15:54.960357
========= sending heartbeat at 2024-04-19 07:16:04.980344
========= sending heartbeat at 2024-04-19 07:16:15.000483
========= sending heartbeat at 2024-04-19 07:16:25.010878
========= sending heartbeat at 2024-04-19 07:16:35.031345
========= sending heartbeat at 2024-04-19 07:16:45.052139
========= sending heartbeat at 2024-04-19 07:16:55.069858
========= sending heartbeat at 2024-04-19 07:17:05.087934
========= sending heartbeat at 2024-04-19 07:17:15.107930
========= sending heartbeat at 2024-04-19 07:17:25.128790
========= sending heartbeat at 2024-04-19 07:17:35.148239
========= sending heartbeat at 2024-04-19 07:17:45.169070
========= sending heartbeat at 2024-04-19 07:17:55.188679
========= sending heartbeat at 2024-04-19 07:18:05.209056
========= sending heartbeat at 2024-04-19 07:18:15.227364
========= sending heartbeat at 2024-04-19 07:18:25.243414
========= sending heartbeat at 2024-04-19 07:18:35.261546
========= sending heartbeat at 2024-04-19 07:18:45.293585
========= sending heartbeat at 2024-04-19 07:18:55.311865
========= sending heartbeat at 2024-04-19 07:19:05.329722
========= sending heartbeat at 2024-04-19 07:19:15.349566
========= sending heartbeat at 2024-04-19 07:19:25.368679
========= sending heartbeat at 2024-04-19 07:19:35.387079
========= sending heartbeat at 2024-04-19 07:19:45.406148
========= sending heartbeat at 2024-04-19 07:19:55.423671
========= sending heartbeat at 2024-04-19 07:20:05.443055
========= sending heartbeat at 2024-04-19 07:20:15.453378
========= sending heartbeat at 2024-04-19 07:20:25.471598
========= sending heartbeat at 2024-04-19 07:20:35.491593
========= sending heartbeat at 2024-04-19 07:20:45.515650
========= sending heartbeat at 2024-04-19 07:20:55.538863
========= sending heartbeat at 2024-04-19 07:21:05.557702
========= sending heartbeat at 2024-04-19 07:21:15.579672
========= sending heartbeat at 2024-04-19 07:21:25.625655
========= sending heartbeat at 2024-04-19 07:21:35.653672
========= sending heartbeat at 2024-04-19 07:21:45.703687
========= sending heartbeat at 2024-04-19 07:21:55.724005
========= sending heartbeat at 2024-04-19 07:22:05.742972
========= sending heartbeat at 2024-04-19 07:22:15.762083
========= sending heartbeat at 2024-04-19 07:22:25.781849
========= sending heartbeat at 2024-04-19 07:22:35.800539
========= sending heartbeat at 2024-04-19 07:22:45.871673
========= sending heartbeat at 2024-04-19 07:22:55.890623
========= sending heartbeat at 2024-04-19 07:23:05.910626
========= sending heartbeat at 2024-04-19 07:23:15.930143
========= sending heartbeat at 2024-04-19 07:23:25.948669
========= sending heartbeat at 2024-04-19 07:23:35.969156
========= sending heartbeat at 2024-04-19 07:23:45.992808
========= sending heartbeat at 2024-04-19 07:23:56.013876
========= sending heartbeat at 2024-04-19 07:24:06.033675
========= sending heartbeat at 2024-04-19 07:24:16.088180
========= sending heartbeat at 2024-04-19 07:24:26.108724
========= sending heartbeat at 2024-04-19 07:24:36.129566
========= sending heartbeat at 2024-04-19 07:24:46.149673
========= sending heartbeat at 2024-04-19 07:24:56.168576
========= sending heartbeat at 2024-04-19 07:25:06.185672
========= sending heartbeat at 2024-04-19 07:25:16.210671
========= sending heartbeat at 2024-04-19 07:25:26.227671
========= sending heartbeat at 2024-04-19 07:25:36.246229
========= sending heartbeat at 2024-04-19 07:25:46.270668
========= sending heartbeat at 2024-04-19 07:25:56.287681
========= sending heartbeat at 2024-04-19 07:26:06.305283
========= sending heartbeat at 2024-04-19 07:26:16.322751
========= sending heartbeat at 2024-04-19 07:26:26.340861
========= sending heartbeat at 2024-04-19 07:26:36.360675
========= sending heartbeat at 2024-04-19 07:26:46.415013
========= sending heartbeat at 2024-04-19 07:26:56.435998
========= sending heartbeat at 2024-04-19 07:27:06.456782
========= sending heartbeat at 2024-04-19 07:27:16.477638
========= sending heartbeat at 2024-04-19 07:27:26.494211
========= sending heartbeat at 2024-04-19 07:27:36.514711
========= sending heartbeat at 2024-04-19 07:27:46.536023
========= sending heartbeat at 2024-04-19 07:27:56.558103
========= sending heartbeat at 2024-04-19 07:28:06.574359
========= sending heartbeat at 2024-04-19 07:28:16.594218
========= sending heartbeat at 2024-04-19 07:28:26.614903
========= sending heartbeat at 2024-04-19 07:28:36.628537
========= sending heartbeat at 2024-04-19 07:28:46.650959
========= sending heartbeat at 2024-04-19 07:28:56.672312
========= sending heartbeat at 2024-04-19 07:29:06.696677
========= sending heartbeat at 2024-04-19 07:29:16.763427
========= sending heartbeat at 2024-04-19 07:29:26.782500
========= sending heartbeat at 2024-04-19 07:29:36.801624
========= sending heartbeat at 2024-04-19 07:29:46.819389
========= sending heartbeat at 2024-04-19 07:29:56.838869
========= sending heartbeat at 2024-04-19 07:30:06.861448
========= sending heartbeat at 2024-04-19 07:30:16.881875
========= sending heartbeat at 2024-04-19 07:30:26.901675
========= sending heartbeat at 2024-04-19 07:30:36.911884
========= sending heartbeat at 2024-04-19 07:30:46.933822
========= sending heartbeat at 2024-04-19 07:30:56.957668
========= sending heartbeat at 2024-04-19 07:31:06.977167
========= sending heartbeat at 2024-04-19 07:31:16.999581
========= sending heartbeat at 2024-04-19 07:31:27.012035
========= sending heartbeat at 2024-04-19 07:31:37.044669
========= sending heartbeat at 2024-04-19 07:31:47.109000
========= sending heartbeat at 2024-04-19 07:31:57.128215
========= sending heartbeat at 2024-04-19 07:32:07.141767
========= sending heartbeat at 2024-04-19 07:32:17.161765
========= sending heartbeat at 2024-04-19 07:32:27.180869
========= sending heartbeat at 2024-04-19 07:32:49.308582
========= sending heartbeat at 2024-04-19 07:32:59.375681
========= sending heartbeat at 2024-04-19 07:33:09.426020
========= sending heartbeat at 2024-04-19 07:33:19.445142
========= sending heartbeat at 2024-04-19 07:33:29.464815
========= sending heartbeat at 2024-04-19 07:33:39.483268
========= sending heartbeat at 2024-04-19 07:33:49.502420
========= sending heartbeat at 2024-04-19 07:33:59.524142
========= sending heartbeat at 2024-04-19 07:34:09.544669
========= sending heartbeat at 2024-04-19 07:34:19.565671
========= sending heartbeat at 2024-04-19 07:34:29.586671
========= sending heartbeat at 2024-04-19 07:34:39.607704
========= sending heartbeat at 2024-04-19 07:34:49.627900
========= sending heartbeat at 2024-04-19 07:34:59.656655
========= sending heartbeat at 2024-04-19 07:35:09.677566
========= sending heartbeat at 2024-04-19 07:35:19.696330
========= sending heartbeat at 2024-04-19 07:35:29.798127
========= sending heartbeat at 2024-04-19 07:35:39.819214
========= sending heartbeat at 2024-04-19 07:35:49.837031
========= sending heartbeat at 2024-04-19 07:35:59.856974
========= sending heartbeat at 2024-04-19 07:36:09.875883
========= sending heartbeat at 2024-04-19 07:36:19.894136
========= sending heartbeat at 2024-04-19 07:36:29.912533
========= sending heartbeat at 2024-04-19 07:36:39.932303
========= sending heartbeat at 2024-04-19 07:36:49.953714
========= sending heartbeat at 2024-04-19 07:36:59.974942
========= sending heartbeat at 2024-04-19 07:37:09.995481
========= sending heartbeat at 2024-04-19 07:37:20.014864
========= sending heartbeat at 2024-04-19 07:37:30.033078
========= sending heartbeat at 2024-04-19 07:37:40.052600
========= sending heartbeat at 2024-04-19 07:37:50.073312
========= sending heartbeat at 2024-04-19 07:38:00.095681
========= sending heartbeat at 2024-04-19 07:38:10.150970
========= sending heartbeat at 2024-04-19 07:38:20.169516
========= sending heartbeat at 2024-04-19 07:38:30.188302
========= sending heartbeat at 2024-04-19 07:38:40.207655
========= sending heartbeat at 2024-04-19 07:38:50.218773
========= sending heartbeat at 2024-04-19 07:39:00.240722
========= sending heartbeat at 2024-04-19 07:39:10.259671
========= sending heartbeat at 2024-04-19 07:39:20.280173
========= sending heartbeat at 2024-04-19 07:39:30.302026
========= sending heartbeat at 2024-04-19 07:39:40.321950
========= sending heartbeat at 2024-04-19 07:39:50.331888
========= sending heartbeat at 2024-04-19 07:40:00.350672
========= sending heartbeat at 2024-04-19 07:40:10.370364
========= sending heartbeat at 2024-04-19 07:40:20.389615
========= sending heartbeat at 2024-04-19 07:40:30.407684
========= sending heartbeat at 2024-04-19 07:40:40.431673
========= sending heartbeat at 2024-04-19 07:40:50.494965
========= sending heartbeat at 2024-04-19 07:41:00.514705
========= sending heartbeat at 2024-04-19 07:41:10.532092
========= sending heartbeat at 2024-04-19 07:41:20.551055
========= sending heartbeat at 2024-04-19 07:41:30.568784
========= sending heartbeat at 2024-04-19 07:41:55.079878
========= sending heartbeat at 2024-04-19 07:42:05.099793
========= sending heartbeat at 2024-04-19 07:42:15.117841
========= sending heartbeat at 2024-04-19 07:42:25.137679
========= sending heartbeat at 2024-04-19 07:42:35.157420
========= sending heartbeat at 2024-04-19 07:42:45.176324
========= sending heartbeat at 2024-04-19 07:42:55.196058
========= sending heartbeat at 2024-04-19 07:43:05.261703
========= sending heartbeat at 2024-04-19 07:43:15.280871
========= sending heartbeat at 2024-04-19 07:43:25.299642
========= sending heartbeat at 2024-04-19 07:43:35.319673
========= sending heartbeat at 2024-04-19 07:43:45.330023
========= sending heartbeat at 2024-04-19 07:43:55.350668
========= sending heartbeat at 2024-04-19 07:44:05.375474
========= sending heartbeat at 2024-04-19 07:44:15.395239
========= sending heartbeat at 2024-04-19 07:44:25.412908
========= sending heartbeat at 2024-04-19 07:44:35.438024
========= sending heartbeat at 2024-04-19 07:44:45.447770
========= sending heartbeat at 2024-04-19 07:44:55.466686
========= sending heartbeat at 2024-04-19 07:45:05.485665
========= sending heartbeat at 2024-04-19 07:45:15.504199
========= sending heartbeat at 2024-04-19 07:45:25.522732
========= sending heartbeat at 2024-04-19 07:45:35.540865
========= sending heartbeat at 2024-04-19 07:45:45.558979
========= sending heartbeat at 2024-04-19 07:45:55.597394
========= sending heartbeat at 2024-04-19 07:46:05.618303
========= sending heartbeat at 2024-04-19 07:46:15.636898
========= sending heartbeat at 2024-04-19 07:46:25.651582
========= sending heartbeat at 2024-04-19 07:46:35.670796
========= sending heartbeat at 2024-04-19 07:46:45.689343
========= sending heartbeat at 2024-04-19 07:46:55.709008
========= sending heartbeat at 2024-04-19 07:47:05.730671
========= sending heartbeat at 2024-04-19 07:47:15.785052
========= sending heartbeat at 2024-04-19 07:47:25.804204
========= sending heartbeat at 2024-04-19 07:47:35.820744
========= sending heartbeat at 2024-04-19 07:47:45.839721
========= sending heartbeat at 2024-04-19 07:47:55.859224
========= sending heartbeat at 2024-04-19 07:48:05.881464
========= sending heartbeat at 2024-04-19 07:48:15.900243
========= sending heartbeat at 2024-04-19 07:48:25.917672
========= sending heartbeat at 2024-04-19 07:48:35.936514
========= sending heartbeat at 2024-04-19 07:48:45.955367
========= sending heartbeat at 2024-04-19 07:48:55.974278
========= sending heartbeat at 2024-04-19 07:49:05.995301
========= sending heartbeat at 2024-04-19 07:49:16.014518
========= sending heartbeat at 2024-04-19 07:49:26.033934
========= sending heartbeat at 2024-04-19 07:49:36.053677
========= sending heartbeat at 2024-04-19 07:49:46.107685
========= sending heartbeat at 2024-04-19 07:49:56.125678
========= sending heartbeat at 2024-04-19 07:50:06.146256
========= sending heartbeat at 2024-04-19 07:50:16.160232
========= sending heartbeat at 2024-04-19 07:50:26.179628
========= sending heartbeat at 2024-04-19 07:50:36.200087
========= sending heartbeat at 2024-04-19 07:50:46.218674
========= sending heartbeat at 2024-04-19 07:50:56.238067
========= sending heartbeat at 2024-04-19 07:51:06.257642
========= sending heartbeat at 2024-04-19 07:51:16.275670
========= sending heartbeat at 2024-04-19 07:51:26.296752
========= sending heartbeat at 2024-04-19 07:51:36.314667
========= sending heartbeat at 2024-04-19 07:51:46.328102
========= sending heartbeat at 2024-04-19 07:51:56.371387
========= sending heartbeat at 2024-04-19 07:52:06.391520
========= sending heartbeat at 2024-04-19 07:52:16.409197
========= sending heartbeat at 2024-04-19 07:52:26.427761
========= sending heartbeat at 2024-04-19 07:52:36.446540
========= sending heartbeat at 2024-04-19 07:52:46.464773
========= sending heartbeat at 2024-04-19 07:52:56.484212
========= sending heartbeat at 2024-04-19 07:53:06.560117
========= sending heartbeat at 2024-04-19 07:53:16.580450
========= sending heartbeat at 2024-04-19 07:53:26.597667
========= sending heartbeat at 2024-04-19 07:53:36.616228
========= sending heartbeat at 2024-04-19 07:53:46.636765
========= sending heartbeat at 2024-04-19 07:53:56.657260
========= sending heartbeat at 2024-04-19 07:54:06.677719
========= sending heartbeat at 2024-04-19 07:54:16.697119
========= sending heartbeat at 2024-04-19 07:54:26.716854
========= sending heartbeat at 2024-04-19 07:54:36.728329
========= sending heartbeat at 2024-04-19 07:54:46.746676
========= sending heartbeat at 2024-04-19 07:54:56.765446
========= sending heartbeat at 2024-04-19 07:55:06.788679
========= sending heartbeat at 2024-04-19 07:55:16.807389
========= sending heartbeat at 2024-04-19 07:55:26.825537
========= sending heartbeat at 2024-04-19 07:55:36.853849
========= sending heartbeat at 2024-04-19 07:55:46.873958
========= sending heartbeat at 2024-04-19 07:55:56.893213
========= sending heartbeat at 2024-04-19 07:56:06.916821
========= sending heartbeat at 2024-04-19 07:56:16.936625
========= sending heartbeat at 2024-04-19 07:56:26.955890
========= sending heartbeat at 2024-04-19 07:56:36.975621
========= sending heartbeat at 2024-04-19 07:56:46.993934
========= sending heartbeat at 2024-04-19 07:56:57.013811
========= sending heartbeat at 2024-04-19 07:57:07.041382
========= sending heartbeat at 2024-04-19 07:57:17.060287
========= sending heartbeat at 2024-04-19 07:57:27.079713
========= sending heartbeat at 2024-04-19 07:57:37.099627
========= sending heartbeat at 2024-04-19 07:57:47.119395
========= sending heartbeat at 2024-04-19 07:57:57.138747
========= sending heartbeat at 2024-04-19 07:58:07.158642
========= sending heartbeat at 2024-04-19 07:58:17.178673
========= sending heartbeat at 2024-04-19 07:58:27.197513
========= sending heartbeat at 2024-04-19 07:58:37.216564
========= sending heartbeat at 2024-04-19 07:58:47.236174
========= sending heartbeat at 2024-04-19 07:58:57.255716
========= sending heartbeat at 2024-04-19 07:59:07.280802
========= sending heartbeat at 2024-04-19 07:59:17.300290
========= sending heartbeat at 2024-04-19 07:59:27.319261
========= sending heartbeat at 2024-04-19 07:59:37.339088
========= sending heartbeat at 2024-04-19 07:59:47.357222
========= sending heartbeat at 2024-04-19 07:59:57.378620
========= sending heartbeat at 2024-04-19 08:00:07.402619
========= sending heartbeat at 2024-04-19 08:00:17.422346
========= sending heartbeat at 2024-04-19 08:00:27.442437
========= sending heartbeat at 2024-04-19 08:00:37.461685
========= sending heartbeat at 2024-04-19 08:00:47.480399
========= sending heartbeat at 2024-04-19 08:00:57.500182
========= sending heartbeat at 2024-04-19 08:01:07.521675
========= sending heartbeat at 2024-04-19 08:01:17.576017
========= sending heartbeat at 2024-04-19 08:01:27.596334
========= sending heartbeat at 2024-04-19 08:01:37.615205
========= sending heartbeat at 2024-04-19 08:01:47.634448
========= sending heartbeat at 2024-04-19 08:01:57.691597
========= sending heartbeat at 2024-04-19 08:02:07.711084
========= sending heartbeat at 2024-04-19 08:02:17.730686
========= sending heartbeat at 2024-04-19 08:02:27.750252
========= sending heartbeat at 2024-04-19 08:02:37.768204
========= sending heartbeat at 2024-04-19 08:02:47.787638
========= sending heartbeat at 2024-04-19 08:03:09.461814
========= sending heartbeat at 2024-04-19 08:03:19.480134
========= sending heartbeat at 2024-04-19 08:03:29.500153
========= sending heartbeat at 2024-04-19 08:03:39.522326
========= sending heartbeat at 2024-04-19 08:03:49.544481
========= sending heartbeat at 2024-04-19 08:03:59.565873
========= sending heartbeat at 2024-04-19 08:04:09.586466
========= sending heartbeat at 2024-04-19 08:04:19.607576
========= sending heartbeat at 2024-04-19 08:04:29.628882
========= sending heartbeat at 2024-04-19 08:04:39.651411
========= sending heartbeat at 2024-04-19 08:04:49.672352
========= sending heartbeat at 2024-04-19 08:04:59.691662
========= sending heartbeat at 2024-04-19 08:05:09.757019
========= sending heartbeat at 2024-04-19 08:05:19.775977
========= sending heartbeat at 2024-04-19 08:05:29.796471
========= sending heartbeat at 2024-04-19 08:05:39.814658
========= sending heartbeat at 2024-04-19 08:05:49.833308
========= sending heartbeat at 2024-04-19 08:05:59.851138
========= sending heartbeat at 2024-04-19 08:06:09.871001
========= sending heartbeat at 2024-04-19 08:06:19.883223
========= sending heartbeat at 2024-04-19 08:06:29.902788
========= sending heartbeat at 2024-04-19 08:06:39.921279
========= sending heartbeat at 2024-04-19 08:06:49.940874
========= sending heartbeat at 2024-04-19 08:06:59.959592
========= sending heartbeat at 2024-04-19 08:07:09.981211
slurmstepd: error: *** JOB 8915891 ON bun005 CANCELLED AT 2024-04-19T08:07:10 DUE TO TIME LIMIT ***

Did cluster admins specify a 1 hour MaxTime or DefaultTime for the gpu_cuda cluster partition

scontrol show partition gpu_cuda

?