When I load workflows to process downstream from a live session, I get failed jobs because it’s telling me it is missing “non-optional inputs” although the inputs are there are completely fine.
I have the sense this is a bug because if I delete the inputs and put the exact same inputs, or clone a job and put the same inputs, it goes through.
Sometimes this bug happens and sometimes it doesn’t upon loading a workflow. When it happens, it is annoying because it forces us to re-link all the inputs of several jobs, if not all.
The issue you have run into in this particular use case has not been encountered internally by us as of yet. I would like to ask a few follow up questions to get a better sense of the issue.
Are you using a Workflow that has parent connections (Workflows with Parent Connections), or are you connecting to a self contained Workflow manually?
What job types and inputs are you connecting to from the live session, and which inputs are being lost (ie. are you connecting particles from a Live Particle Export job to a 2D Class job, or perhaps connecting directly to live jobs ie. connecting volume outputs from a Homo Refine (S) to a Vol Tools job).
Are you connecting inputs from a currently running live job, or a completed job?
Similar behaviour has been encountered when connecting to jobs that generate their outputs at runtime, this is noted in the “Limitations” section of the Workflows documentation here: Workflow Limitations. This could well be the issue you have run into, the questions above should help us validate whether or not that is the case.
The way we do it is we have a live session and we export exposures. Then, the workflow is supposed to plug in and take accepted_exposures as input to do the initial data processing.
The parent jobs that I import manually are the deep picker model and reference volumes that plugin to the deep picker inference job, heterogenous refinment and NU refinment jobs.
It just failed again at the extract micrographs stage (see screenshot below, right after particle picking. So at a very early stage and also in what seems to me to be a very self contained part of the workflow. None of the parent jobs are needed yet to do this, so I don’t understand how to remedy this issue.
Thank you for the additional information. We have identified and recorded this as a bug, which does stem from the same situation noted in the Workflow Limitations section of the documentation.
The “Deep Picker/Inference” jobs generate some of their outputs (including passthrough group slots) while running, meaning they will not be reliably available until the job has completed, and will not be available at all when the job is building.
Workflows make the input/output connections between jobs immediately after creating them, while the jobs are in building status. This means that the workflow will fail to connect inputs in these cases as they do not exist when it attempts to make the connection.
Our recommendation currently in this case would be to create the “Deep Picker/Inference” jobs separately, allow them to run to completion, and then connect your workflow to them, instead of including them in the workflow itself.
If I understand correctly, we can get away by loading the whole workflow but only building and queuing the Deep INference job and leaving all the other jobs in building mode? Or should I load them as a separate workflow?
It is not advised to do the former and build the entire workflow with the Deep Inference job included. The Deep Inference job should be created and connected separately, either from the job builder, a separate workflow, or from a blueprint.
This recommendation is because even if you run and queue the Deep Inference job separately, but create it in the same workflow, it will still not generate the outputs required by its child jobs until it is done running, while the workflow attempts to make the connections immediately. This means that missing output groups will not be connected at all, but more problematically, some output groups can be connected, but have missing group slots and passthroughs. This can cause jobs to fail at different points along the workflow if they require any of the expected slots and passthroughs. These issues can be very hard diagnose and fix manually.
The workflows still have jobs that fail with the same “non-optional input missing” error. I make sure to do all the preprocessing before loading the workflow but yet it happens. It happened with Extract From Micrographs and Class-P filter jobs.
This limits the use of workflows as it will stahl the processing overnight and I have to rerun it in the morning.
This behaviour is expected based on the way your workflow has been constructed. The workflow is being applied without any parent connection for the Extract from Micrographs job, which causes essentially the same problem as connecting the workflow to a job that generates its outputs during runtime. That being that the downstream jobs will be missing passthroughs from the initial job, because it was not connected when they were created.
If your workflow needs inputs from jobs that are not included in the workflow itself, you must create it with a “parent connection”, which is a “dependency” of the workflow. In this case, your Extract from Micrographs job would have a “parent connection” to your Deep Picker Inference job, which would be a “dependency” of your workflow, but not included in it.