Usually, the Topaz extract preprocessing takes significantly more time than the extraction. Since I almost always use the same preprocessing parameters for all Topaz jobs, it would be nice to be able to use the already preprocessed micrographs from a previous Extract job when extracting with a new Topaz model. However, when I connect the micrograph output from a previous Extract job, it still does the preprocessing again.
I have noticed that the Topaz Extract job saves the preprocessed micrographs in the “preprocessed” subdirectory of the connected Topaz Model job directory. To circumvent the issue I created a soft link between an old Topaz model “preprocessed” folder containing all preprocessed micrographs and the directory of the new model I wanted to use, which in my case saved me several hours of preprocessing.
Is the default re-preprocessing the expected behavior of the Topaz extract job, in order to be sure that any changes to the preprocessing parameters are being used? Would it be possible to implement an option to passthrough the preprocessed micrographs from an earlier Topaz Extract job?
Can you give some more details about how this symlink works, i.e. exactly which directories you symlinked? Did you have to modify any code to get it to work?
The current behavior is to make sure that models use the same preprocessing during inference that was applied when they were trained, however we would be happy to look into modifying this behavior in a future release.
Well, it is not the most elegant solution, but it works.
These are the jobs I will refer to:
J1 - curated exposures
J2 - old Topaz Train job
J3 - old Topaz Extract job where I have extracted from all my micrographs from J1
J4 - new Topaz Train job
J5 - new Topaz Extract job
Let’s say I have just trained a new model (J4) and I want to extract from micrographs (J1) that I have extracted from in a previous job (J3). I then remove the preprocessed training micrographs from the job directory belonging to the new model (J4): rm -r /cryosparc/P1/J4/preprocessed
Next, I create a soft link between the “preprocessed” directory of the old model (J2) and the new one (J4): ln -s /cryosparc/P1/J2/preprocessed /cryosparc/P1/J4/preprocessed
I can the finally start a new Topaz Extract job (J5) with J4 and J1 as input, which then just skips the preprocessing stage and saves me a ton of time.
The feature you describe has been added in patch v3.3.1+211214. Now in Topaz Train you can specify a preprocessed micrograph directory in the job builder. You can do the same in the Topaz Extract job.
Best,
Louis
@lprimeau Is this applicable to reusing preprocessed mics from a previous Topaz Extract job (built-in pretrained model) in a subsequent Topaz Extract job (again pretrained built-in model)? No Topaz Train involved.
I tried providing the absolute path from the last Topaz Extract job, but it went ahead and started preprocessing from scratch.
Yes, this should work. The path to use in Topaz Extract is project_path/JXXX/preprocessed where JXXX is the job directory of a Topaz Extract / Topaz Train job where preprocessing was performed. From your description it sounds like you did not put the preprocessed part at the end. Sorry for the confusion!