Here is the job log
========= sending heartbeat at 2024-08-06 15:09:42.627306
========= sending heartbeat at 2024-08-06 15:09:52.634183
========= sending heartbeat at 2024-08-06 15:10:02.649174
Running job on hostname %s lilac
Allocated Resources : {‘fixed’: {‘SSD’: False}, ‘hostname’: ‘lilac’, ‘lane’: ‘lilac’, ‘lane_type’: ‘cluster’, ‘license’: False, ‘licenses_acquired’: 0, ‘slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7], ‘GPU’: , ‘RAM’: [0]}, ‘target’: {‘cache_path’: ‘/scratch’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: [‘ram_gb_multiplier’], ‘custom_vars’: {‘ram_gb_multiplier’: ‘1’}, ‘desc’: None, ‘hostname’: ‘lilac’, ‘lane’: ‘lilac’, ‘name’: ‘lilac’, ‘qdel_cmd_tpl’: ‘/admin/lsflilac/lsf/10.1/linux3.10-glibc2.17-x86_64/bin/bkill {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: ‘/admin/lsflilac/lsf/10.1/linux3.10-glibc2.17-x86_64/bin/bqueues’, ‘qstat_cmd_tpl’: ‘/admin/lsflilac/lsf/10.1/linux3.10-glibc2.17-x86_64/bin/bjobs -l {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘/admin/lsflilac/lsf/10.1/linux3.10-glibc2.17-x86_64/bin/bsub < {{ script_path_abs }}’, ‘script_tpl’: ‘#!/bin/bash\n#BSUB -J cryosparc_{{ project_uid }}{{ job_uid }}{{ cryosparc_username }}\n#BSUB -m lj-gpu\n#BSUB -e {{ job_dir_abs }}/%J.err\n#BSUB -o {{ job_dir_abs }}/%J.out\n#BSUB -n {{ num_cpu }}\n#BSUB -R “span[ptile={{ num_cpu }}]”\n#BSUB -R “rusage[mem={{(ram_gb|float * ram_gb_multiplier|float)|int }}]”\n#BSUB -W 167:00\n{%- if num_gpu == 0 %}\n ###BSUB -q cpuqueue\n #BSUB -q gpuqueue\n\t#BSUB -R “A100||A40” -sla llSC\n{%- else %}\n #BSUB -q gpuqueue\n #BSUB -gpu “num={{ num_gpu }}:gmem=20G:j_exclusive=no:mode=shared”\n\t#BSUB -R “A100||A40” -sla llSC\n{%- endif %}\n\n#BSUB -W 167:00\n##Load modules\n\n{{ run_cmd }}\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘lilac’, ‘tpl_vars’: [‘num_gpu’, ‘cryosparc_username’, ‘run_cmd’, ‘job_dir_abs’, ‘command’, ‘ram_gb’, ‘cluster_job_id’, ‘project_uid’, ‘job_uid’, ‘ram_gb_multiplier’, ‘num_cpu’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/opt/common/cryosparc/patel/CS-4.3.1/cryosparc_worker/bin/cryosparcw’}}
**** handle exception rc
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_compute.run.main
File “/admin/opt/common/cryosparc/patel/CS-4.3.1/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py”, line 792, in run_topaz_wrapper_cross_validation
assert len(tables) > 0, “All subsidiary training jobs failed or were killed.”
AssertionError: All subsidiary training jobs failed or were killed.
set status to failed