Normally, a job will not start unless all the results for inputs have been calculated. But is there a way to delay the start of a job B until another job A (or jobs), if A is not providing inputs for job B? Either in regular jobs or in workflows?
The reason for the question is that a colleague of mine is developing “batch live” data processing, since he wants to use the full toolkit of offline CryoSPARC such as denoiser and topaz picker.
1 Like
Hi @daniel.s.d.larsson! You’ll have to use CryoSPARC Tools for this for now. The script below is a toy example of one way you might do this. It creates three jobs:
- Job A: A particle sets job
- Job B: A Select 2D job
- Job C: A particle sets job with only Job A as input
Job C is not queued until job B completes even though job B is not a direct parent of job C.
from cryosparc.tools import CryoSPARC
import json
import numpy as np
from pathlib import Path
with open(Path('~/instance-info.json').expanduser(), 'r') as f:
instance_info = json.load(f)
cs = CryoSPARC(**instance_info)
assert cs.test_connection()
project_uid = "P13"
workspace_uid = "W4"
project = cs.find_project(project_uid)
particle_src_uid = "J28"
particle_src_title = "particles"
lane = "cryoem6"
job_a = project.create_job(
workspace_uid = workspace_uid,
type = "particle_sets",
connections = {
"particles_A": (particle_src_uid, particle_src_title)
},
params = {
"set_operation": "split",
"set_split_num": 1,
"set_split_size": 10000,
"set_split_random": True
}
)
job_b = project.create_job(
workspace_uid = workspace_uid,
type = "select_2D",
connections = {
"particles": (particle_src_uid, particle_src_title),
"templates": (particle_src_uid, "class_averages")
}
)
job_c = project.create_job(
workspace_uid = workspace_uid,
type = "particle_sets",
connections = {
"particles_A": (job_a.uid, "split_0")
},
params = {
"set_operation": "split",
"set_split_num": 1,
"set_split_size": 1000,
"set_split_random": True
}
)
job_a.queue(lane)
# select2D has to be queued on the master lane
job_b.queue()
# this halts the script until job_b is done
job_b.wait_for_done()
job_c.queue(lane)