Topaz Extract preprocessing

Usually, the Topaz extract preprocessing takes significantly more time than the extraction. Since I almost always use the same preprocessing parameters for all Topaz jobs, it would be nice to be able to use the already preprocessed micrographs from a previous Extract job when extracting with a new Topaz model. However, when I connect the micrograph output from a previous Extract job, it still does the preprocessing again.

I have noticed that the Topaz Extract job saves the preprocessed micrographs in the “preprocessed” subdirectory of the connected Topaz Model job directory. To circumvent the issue I created a soft link between an old Topaz model “preprocessed” folder containing all preprocessed micrographs and the directory of the new model I wanted to use, which in my case saved me several hours of preprocessing.

Is the default re-preprocessing the expected behavior of the Topaz extract job, in order to be sure that any changes to the preprocessing parameters are being used? Would it be possible to implement an option to passthrough the preprocessed micrographs from an earlier Topaz Extract job?


Can you give some more details about how this symlink works, i.e. exactly which directories you symlinked? Did you have to modify any code to get it to work?

The current behavior is to make sure that models use the same preprocessing during inference that was applied when they were trained, however we would be happy to look into modifying this behavior in a future release.


Hi Louis,

Well, it is not the most elegant solution, but it works. :slight_smile:

These are the jobs I will refer to:

  • J1 - curated exposures
  • J2 - old Topaz Train job
  • J3 - old Topaz Extract job where I have extracted from all my micrographs from J1
  • J4 - new Topaz Train job
  • J5 - new Topaz Extract job

Let’s say I have just trained a new model (J4) and I want to extract from micrographs (J1) that I have extracted from in a previous job (J3). I then remove the preprocessed training micrographs from the job directory belonging to the new model (J4):
rm -r /cryosparc/P1/J4/preprocessed

Next, I create a soft link between the “preprocessed” directory of the old model (J2) and the new one (J4):
ln -s /cryosparc/P1/J2/preprocessed /cryosparc/P1/J4/preprocessed

I can the finally start a new Topaz Extract job (J5) with J4 and J1 as input, which then just skips the preprocessing stage and saves me a ton of time.


Hi Emil,

The feature you describe has been added in patch v3.3.1+211214. Now in Topaz Train you can specify a preprocessed micrograph directory in the job builder. You can do the same in the Topaz Extract job.

1 Like

@lprimeau Is this applicable to reusing preprocessed mics from a previous Topaz Extract job (built-in pretrained model) in a subsequent Topaz Extract job (again pretrained built-in model)? No Topaz Train involved.

I tried providing the absolute path from the last Topaz Extract job, but it went ahead and started preprocessing from scratch.


Hi Yang,

Yes, this should work. The path to use in Topaz Extract is project_path/JXXX/preprocessed where JXXX is the job directory of a Topaz Extract / Topaz Train job where preprocessing was performed. From your description it sounds like you did not put the preprocessed part at the end. Sorry for the confusion!


Hi Louis,

That does check out on my end. I did specify the preprocessed folder.

Below are the exact details. Perhaps you could help shed some light on what I’m doing wrong?

Project path /cephfs/ylee/EMP/P19/

Initial Topaz Extract job P19/J12
ResNet16 (64 units)

Second Topaz Extract job P19/J39
Cloned job from P19/J12
Specified /cephfs/ylee/EMP/P19/J12/preprocessed directory

I am deactivating the cryoSPARC conda environment and activating a Topaz-specific one as part of the executable. Would that mess things up?


This has been fixed in Patch 211214: Patch 211214 is available for cryoSPARC v3.3.1.

1 Like