Local refinement fails to output results

Refinement ran to the end but failed to output results:

No output result match for particles.blob in job J118
[CPU:  12.12 GB  Avail: 238.33 GB]

No output result match for particles.alignments2D in job J118
[CPU:  12.12 GB  Avail: 238.33 GB]

No output result match for particles.location in job J118
[CPU:  12.12 GB  Avail: 238.33 GB]

No output result match for particles.pick_stats in job J118
[CPU:  12.12 GB  Avail: 238.33 GB]

No output result match for particles.sym_expand in job J118
[CPU:  12.12 GB  Avail: 238.33 GB]

  Loaded passthrough dset with 0 items
[CPU:  11.03 GB  Avail: 239.40 GB]

Checking outputs for output group particles
[CPU:  12.16 GB  Avail: 238.22 GB]

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 134, in cryosparc_master.cryosparc_compute.run.main
  File "/mnt/.../cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 1243, in check_outputs
    assert output_len == target_len, "Result %s had the wrong length (%d, should be %d). Note that outputs are still usable." % (resname, output_len, target_len)
AssertionError: Result blob had the wrong length (0, should be 4558200). Note that outputs are still usable.

No upstream jobs failed/were terminated, box size was 256, plenty of memory available.

The tail of job log:

**** handle exception rc
Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 134, in cryosparc_master.cryosparc_compute.run.main
  File "/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 1243, in check_outputs
    assert output_len == target_len, "Result %s had the wrong length (%d, should be %d). Note that outputs are still usable." % (resname, output_len, target_len)
AssertionError: Result blob had the wrong length (0, should be 4558200). Note that outputs are still usable.
set status to failed
========= sending heartbeat at 2025-02-27 06:42:39.416742
  ========= heartbeat failed at 2025-02-27 06:42:39.422773:
========= main process now complete at 2025-02-27 06:42:49.432905.
========= monitor process now complete at 2025-02-27 06:42:49.436204.

Thanks, Michael

@mbs Please can you post

  1. the outputs of these commands
    csprojectid=P99 # replace with actual project ID
    csjobid=J199 # replace with id of the failed job
    cryosparcm cli "get_job('$csprojectid', '$csjobid', 'job_type', 'version', 'instance_information', 'status',  'params_spec', 'errors_run', 'input_slot_groups')"
    cryosparcm eventlog $csprojectid $csjobid | tail -n 40
    
  2. A screenshot(s) of the workspace in tree view, to illustrate the history of the failed job’s inputs. Please choose a zoom level that shows the job types and job IDs.

@wtempel, Here you go. If you need a different snapshot, please let me know.

Michael

cryosparcm cli “get_job(‘$csprojectid’, ‘$csjobid’, ‘job_type’, ‘version’, ‘instance_information’, ‘status’, ‘params_spec’, ‘errors_run’, ‘input_slot_groups’)”

{‘_id’: ‘67bf3cf3a2ce6c8593969c7d’, ‘errors_run’: [{‘message’: ‘No output result match for particles.blob in job J118’, ‘warning’: True}, {‘message’: ‘No output result match for particles.alignments2D in job J118’, ‘warning’: True}, {‘message’: ‘No output result match for particles.location in job J118’, ‘warning’: True}, {‘message’: ‘No output result match for particles.pick_stats in job J118’, ‘warning’: True}, {‘message’: ‘No output result match for particles.sym_expand in job J118’, ‘warning’: True}, {‘message’: ‘Result blob had the wrong length (0, should be 4558200). Note that outputs are still usable.’, ‘warning’: False}], ‘input_slot_groups’: [{‘connections’: [{‘group_name’: ‘particles’, ‘job_uid’: ‘J103’, ‘slots’: [{‘group_name’: ‘particles’, ‘job_uid’: ‘J103’, ‘result_name’: ‘blob’, ‘result_type’: ‘particle.blob’, ‘slot_name’: ‘blob’, ‘version’: ‘F’}, {‘group_name’: ‘particles’, ‘job_uid’: ‘J103’, ‘result_name’: ‘ctf’, ‘result_type’: ‘particle.ctf’, ‘slot_name’: ‘ctf’, ‘version’: ‘F’}, {‘group_name’: ‘particles’, ‘job_uid’: ‘J103’, ‘result_name’: ‘alignments3D’, ‘result_type’: ‘particle.alignments3D’, ‘slot_name’: ‘alignments3D’, ‘version’: ‘F’}, {‘group_name’: ‘particles’, ‘job_uid’: ‘J103’, ‘result_name’: ‘alignments2D’, ‘result_type’: ‘particle.alignments2D’, ‘slot_name’: None, ‘version’: ‘F’}, {‘group_name’: ‘particles’, ‘job_uid’: ‘J103’, ‘result_name’: ‘location’, ‘result_type’: ‘particle.location’, ‘slot_name’: None, ‘version’: ‘F’}, {‘group_name’: ‘particles’, ‘job_uid’: ‘J103’, ‘result_name’: ‘pick_stats’, ‘result_type’: ‘particle.pick_stats’, ‘slot_name’: None, ‘version’: ‘F’}, {‘group_name’: ‘particles’, ‘job_uid’: ‘J103’, ‘result_name’: ‘sym_expand’, ‘result_type’: ‘particle.sym_expand’, ‘slot_name’: None, ‘version’: ‘F’}]}], ‘count_max’: inf, ‘count_min’: 1, ‘description’: ‘Particle stacks to use. Multiple stacks will be concatenated.’, ‘name’: ‘particles’, ‘repeat_allowed’: False, ‘slots’: [{‘description’: ‘’, ‘name’: ‘blob’, ‘optional’: False, ‘title’: ‘Particle data blobs’, ‘type’: ‘particle.blob’}, {‘description’: ‘’, ‘name’: ‘ctf’, ‘optional’: False, ‘title’: ‘Particle ctf parameters’, ‘type’: ‘particle.ctf’}, {‘description’: ‘’, ‘name’: ‘alignments3D’, ‘optional’: False, ‘title’: ‘Particle 3D alignments’, ‘type’: ‘particle.alignments3D’}], ‘title’: ‘Particle stacks’, ‘type’: ‘particle’}, {‘connections’: [{‘group_name’: ‘volume’, ‘job_uid’: ‘J115’, ‘slots’: [{‘group_name’: ‘volume’, ‘job_uid’: ‘J115’, ‘result_name’: ‘map’, ‘result_type’: ‘volume.blob’, ‘slot_name’: ‘map’, ‘version’: ‘F’}]}], ‘count_max’: 1, ‘count_min’: 1, ‘description’: ‘’, ‘name’: ‘volume’, ‘repeat_allowed’: False, ‘slots’: [{‘description’: ‘’, ‘name’: ‘map’, ‘optional’: False, ‘title’: ‘Initial volume raw data’, ‘type’: ‘volume.blob’}], ‘title’: ‘Initial volume’, ‘type’: ‘volume’}, {‘connections’: [{‘group_name’: ‘mask’, ‘job_uid’: ‘J116’, ‘slots’: [{‘group_name’: ‘mask’, ‘job_uid’: ‘J116’, ‘result_name’: ‘mask’, ‘result_type’: ‘volume.blob’, ‘slot_name’: ‘mask’, ‘version’: ‘F’}]}], ‘count_max’: 1, ‘count_min’: 1, ‘description’: ‘’, ‘name’: ‘mask’, ‘repeat_allowed’: False, ‘slots’: [{‘description’: ‘’, ‘name’: ‘mask’, ‘optional’: False, ‘title’: ‘Static mask’, ‘type’: ‘volume.blob’}], ‘title’: ‘Static mask’, ‘type’: ‘mask’}], ‘instance_information’: {‘CUDA_version’: ‘11.8’, ‘available_memory’: ‘249.75GB’, ‘cpu_model’: ‘Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz’, ‘driver_version’: ‘12.4’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:18:00’}, {‘id’: 1, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:3b:00’}, {‘id’: 2, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:86:00’}, {‘id’: 3, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:af:00’}], ‘ofd_hard_limit’: 262144, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 20, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘gimli.utmb.edu’, ‘platform_release’: ‘6.12.11-1.el8.elrepo.x86_64’, ‘platform_version’: ‘#1 SMP PREEMPT_DYNAMIC Thu Jan 23 17:59:11 EST 2025’, ‘total_memory’: ‘376.57GB’, ‘used_memory’: ‘119.28GB’}, ‘job_type’: ‘new_local_refine’, ‘params_spec’: {‘refine_res_init’: {‘value’: 10}, ‘use_alignment_prior’: {‘value’: True}}, ‘project_uid’: ‘P305’, ‘status’: ‘failed’, ‘uid’: ‘J118’, ‘version’: ‘v4.6.2’}

cryosparcm eventlog $csprojectid $csjobid | tail -n 40

[Thu, 27 Feb 2025 12:40:42 GMT] Fourier Space Slices Iteration 009
[Thu, 27 Feb 2025 12:40:43 GMT] Real Space Mask Slices Iteration 009
[Thu, 27 Feb 2025 12:40:43 GMT] FSC Iteration 009
[Thu, 27 Feb 2025 12:40:44 GMT] FSC Iteration 009, after FSC-mask auto-tightening
[Thu, 27 Feb 2025 12:40:44 GMT] cFSCs (Half-angle: 20°) Iteration 009, with autotight mask
[Thu, 27 Feb 2025 12:40:46 GMT] Guinier Plot Iteration 009
[Thu, 27 Feb 2025 12:40:46 GMT] Noise Model Iteration 009
[Thu, 27 Feb 2025 12:40:56 GMT] Viewing Direction Distribution Iteration 009
[Thu, 27 Feb 2025 12:40:57 GMT] Posterior Precision Directional Distribution Iteration 009
[Thu, 27 Feb 2025 12:41:46 GMT] Magnitudes of alignment changes Iteration 009
[Thu, 27 Feb 2025 12:41:59 GMT] Per particle scale factors 009
[Thu, 27 Feb 2025 12:41:59 GMT] [CPU RAM used: 14929 MB] Done in 80.465s.
[Thu, 27 Feb 2025 12:41:59 GMT] [CPU RAM used: 14929 MB] Outputting files…
[Thu, 27 Feb 2025 12:42:26 GMT] [CPU RAM used: 14883 MB] Done in 26.215s.
[Thu, 27 Feb 2025 12:42:26 GMT] [CPU RAM used: 14883 MB] Done iteration 9 in 8578.650s. Total time so far 73574.205s
[Thu, 27 Feb 2025 12:42:26 GMT] [CPU RAM used: 14883 MB] ====== Done Refinement ======
[Thu, 27 Feb 2025 12:42:26 GMT] [CPU RAM used: 14883 MB] Note that the output structure from refinement has been
lowpass filtered to the gold-standard FSC resolution estimated
above. However, the structure probably needs sharpening in
order to best visualize high resolution features. This can be
done with the Sharpening task (as a new experiment).
[Thu, 27 Feb 2025 12:42:26 GMT] [CPU RAM used: 14883 MB] Full run took 73574.906s
[Thu, 27 Feb 2025 12:42:27 GMT] [CPU RAM used: 9773 MB] --------------------------------------------------------------
[Thu, 27 Feb 2025 12:42:27 GMT] [CPU RAM used: 9773 MB] Compiling job outputs…
[Thu, 27 Feb 2025 12:42:27 GMT] [CPU RAM used: 9773 MB] Passing through outputs for output group particles from input group particles
[Thu, 27 Feb 2025 12:42:36 GMT] [CPU RAM used: 12122 MB] This job outputted results [‘alignments3D’, ‘ctf’]
[Thu, 27 Feb 2025 12:42:36 GMT] [CPU RAM used: 12122 MB] Loaded output dset with 4558200 items
[Thu, 27 Feb 2025 12:42:36 GMT] [CPU RAM used: 12122 MB] Passthrough results [‘blob’, ‘alignments2D’, ‘location’, ‘pick_stats’, ‘sym_expand’]
[Thu, 27 Feb 2025 12:42:36 GMT] [CPU RAM used: 12122 MB] No output result match for particles.blob in job J118
[Thu, 27 Feb 2025 12:42:36 GMT] [CPU RAM used: 12122 MB] No output result match for particles.alignments2D in job J118
[Thu, 27 Feb 2025 12:42:36 GMT] [CPU RAM used: 12122 MB] No output result match for particles.location in job J118
[Thu, 27 Feb 2025 12:42:36 GMT] [CPU RAM used: 12122 MB] No output result match for particles.pick_stats in job J118
[Thu, 27 Feb 2025 12:42:36 GMT] [CPU RAM used: 12122 MB] No output result match for particles.sym_expand in job J118
[Thu, 27 Feb 2025 12:42:36 GMT] [CPU RAM used: 12122 MB] Loaded passthrough dset with 0 items
[Thu, 27 Feb 2025 12:42:36 GMT] [CPU RAM used: 11026 MB] Checking outputs for output group particles
[Thu, 27 Feb 2025 12:42:39 GMT] [CPU RAM used: 12156 MB] Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 134, in cryosparc_master.cryosparc_compute.run.main
File “/mnt/ape2/cryosparc/software/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 1243, in check_outputs
assert output_len == target_len, “Result %s had the wrong length (%d, should be %d). Note that outputs are still usable.” % (resname, output_len, target_len)
AssertionError: Result blob had the wrong length (0, should be 4558200). Note that outputs are still usable.


@wtempel, read your email one more time, here is the snapshot in tree mode.
Thanks, Michael

@mbs Job J118 appears to use outputs of J103. The deletion of the parent job may explain the failure of its child. What was the motivation for deleting the Reconstruction Only jobs?

Hi @wtempel, not sure, need to check. There might have multiple jobs of that kind and job103 was accidentally deleted as a cleanup. WIll check again and get back to you on that. Thanks, Michael

Hi @wtempel, The failed job was cloned from another job (J111) and that probably the reason J103 was deleted without realizing it contained some relevant data. Several people worked on the project unfortunately.
Thanks for findings! Michael