Topaz Extract preprocessing

Usually, the Topaz extract preprocessing takes significantly more time than the extraction. Since I almost always use the same preprocessing parameters for all Topaz jobs, it would be nice to be able to use the already preprocessed micrographs from a previous Extract job when extracting with a new Topaz model. However, when I connect the micrograph output from a previous Extract job, it still does the preprocessing again.

I have noticed that the Topaz Extract job saves the preprocessed micrographs in the “preprocessed” subdirectory of the connected Topaz Model job directory. To circumvent the issue I created a soft link between an old Topaz model “preprocessed” folder containing all preprocessed micrographs and the directory of the new model I wanted to use, which in my case saved me several hours of preprocessing.

Is the default re-preprocessing the expected behavior of the Topaz extract job, in order to be sure that any changes to the preprocessing parameters are being used? Would it be possible to implement an option to passthrough the preprocessed micrographs from an earlier Topaz Extract job?

Hi,

Can you give some more details about how this symlink works, i.e. exactly which directories you symlinked? Did you have to modify any code to get it to work?

The current behavior is to make sure that models use the same preprocessing during inference that was applied when they were trained, however we would be happy to look into modifying this behavior in a future release.

Louis

Hi Louis,

Well, it is not the most elegant solution, but it works. :slight_smile:

These are the jobs I will refer to:

  • J1 - curated exposures
  • J2 - old Topaz Train job
  • J3 - old Topaz Extract job where I have extracted from all my micrographs from J1
  • J4 - new Topaz Train job
  • J5 - new Topaz Extract job

Let’s say I have just trained a new model (J4) and I want to extract from micrographs (J1) that I have extracted from in a previous job (J3). I then remove the preprocessed training micrographs from the job directory belonging to the new model (J4):
rm -r /cryosparc/P1/J4/preprocessed

Next, I create a soft link between the “preprocessed” directory of the old model (J2) and the new one (J4):
ln -s /cryosparc/P1/J2/preprocessed /cryosparc/P1/J4/preprocessed

I can the finally start a new Topaz Extract job (J5) with J4 and J1 as input, which then just skips the preprocessing stage and saves me a ton of time.

Emil