Custom job dependencies

Normally, a job will not start unless all the results for inputs have been calculated. But is there a way to delay the start of a job B until another job A (or jobs), if A is not providing inputs for job B? Either in regular jobs or in workflows?

The reason for the question is that a colleague of mine is developing “batch live” data processing, since he wants to use the full toolkit of offline CryoSPARC such as denoiser and topaz picker.

1 Like

Hi @daniel.s.d.larsson! You’ll have to use CryoSPARC Tools for this for now. The script below is a toy example of one way you might do this. It creates three jobs:

  • Job A: A particle sets job
  • Job B: A Select 2D job
  • Job C: A particle sets job with only Job A as input

Job C is not queued until job B completes even though job B is not a direct parent of job C.

from cryosparc.tools import CryoSPARC
import json
import numpy as np
from pathlib import Path

with open(Path('~/instance-info.json').expanduser(), 'r') as f:
    instance_info = json.load(f)

cs = CryoSPARC(**instance_info)
assert cs.test_connection()

project_uid = "P13"
workspace_uid = "W4"
project = cs.find_project(project_uid)

particle_src_uid = "J28"
particle_src_title = "particles"

lane = "cryoem6"
job_a = project.create_job(
    workspace_uid = workspace_uid,
    type = "particle_sets",
    connections = {
        "particles_A": (particle_src_uid, particle_src_title)
    },
    params = {
        "set_operation": "split",
        "set_split_num": 1,
        "set_split_size": 10000,
        "set_split_random": True
    }
)

job_b = project.create_job(
    workspace_uid = workspace_uid,
    type = "select_2D",
    connections = {
        "particles": (particle_src_uid, particle_src_title),
        "templates": (particle_src_uid, "class_averages")
    }
)

job_c = project.create_job(
    workspace_uid = workspace_uid,
    type = "particle_sets",
    connections = {
        "particles_A": (job_a.uid, "split_0")
    },
    params = {
        "set_operation": "split",
        "set_split_num": 1,
        "set_split_size": 1000,
        "set_split_random": True
    }
)

job_a.queue(lane)
# select2D has to be queued on the master lane
job_b.queue()

# this halts the script until job_b is done
job_b.wait_for_done()
job_c.queue(lane)