Topaz Extract fails on entire dataset

DavidF · May 9, 2022, 9:52am

Hi all,

I have been training Topaz on my dataset using a subset of 20 micrographs and has worked fine. The same applies when I use the model to run a Topaz Extract job on that micrograph subset.
However, it fails on the entire dataset (About 7.5k images) with exactly the same parameters.

Here is the error output, it is the same no matter what I change:

[CPU: 252.2 MB]  Starting extraction by running command /home/upiv/.conda/envs/topaz/bin/topaz extract --radius 38 --threshold -6 --up-scale 4 --assignment-radius -1 --min-radius 5 --max-radius 100 --step-radius 5 --num-workers 8 --device 0 --model /raid0/cryoem/David/P3/J51/models/model_epoch07.sav -o /raid0/cryoem/David/P3/J65/topaz_particles_prediction.txt [MICROGRAPH PATHS EXCLUDED FOR LEGIBILITY]
[CPU: 252.4 MB]  Traceback (most recent call last):
[CPU: 252.4 MB]  File "/home/.conda/envs/topaz/bin/topaz", line 33, in <module>
[CPU: 252.4 MB]  sys.exit(load_entry_point('topaz-em==0.2.5', 'console_scripts', 'topaz')())
[CPU: 252.4 MB]  File "/home/.conda/envs/topaz/lib/python3.8/site-packages/topaz/main.py", line 148, in main
[CPU: 252.4 MB]  args.func(args)
[CPU: 252.4 MB]  File "/home/.conda/envs/topaz/lib/python3.8/site-packages/topaz/commands/extract.py", line 288, in main
[CPU: 252.4 MB]  for path,score,coords in nms_iterator(stream, radius, threshold, pool=pool):
[CPU: 252.4 MB]  File "/home/.conda/envs/topaz/lib/python3.8/site-packages/topaz/commands/extract.py", line 79, in nms_iterator
[CPU: 252.4 MB]  for name,score,coords in pool.imap_unordered(process, scores):
[CPU: 252.4 MB]  File "/home/.conda/envs/topaz/lib/python3.8/multiprocessing/pool.py", line 868, in next
[CPU: 252.4 MB]  raise value
[CPU: 252.4 MB]  struct.error: unpack requires a buffer of 1024 bytes

Regarding the red part of the error, I get the following:

[CPU: 252.4 MB]  Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 85, in cryosparc_compute.run.main
  File "/home/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py", line 1109, in run_topaz_wrapper_extract
    utils.run_process(extract_command)
  File "/home/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py", line 98, in run_process
    assert process.returncode == 0, f"Subprocess exited with status {process.returncode} ({str_command})"
AssertionError: Subprocess exited with status 1 (/home/.conda/envs/topaz/bin/topaz extract --radius 38 --threshold -6 --up-scale 4 --assignment-radius -1 --min-radius 5 --max-radius 100 --step-radius 5 --num-workers 8 --device 0 --model /raid0/cryoem/David/P3/J51/models/model_epoch07.sav -o /raid0/c…)

Any help would be more than welcome. Thank you very much in advance

vamsee · May 9, 2022, 2:06pm

Hi David,

I’d suggest splitting your micrographs into 2 sets (~3750 each) and running your topaz extract on them separately. Topaz extract tends to fail if you have more than 5k micrographs.

Best,

Vamsee

DavidF · May 11, 2022, 8:11am

Hi Vamsee,

Thanks for the suggestion! It did not work for me but I found the solution (or at least a workaround). I did Topaz preprocess on command line and then ran Topaz extract on the preprocessed directory introducing the absolute path in the GUI option.
I am putting it here so people in the future can refer to it if they run into the same issue.

Best
David

CleoShen · May 25, 2022, 10:35pm

Hi David,

Can I know your Topaz preprocess command line? I ran Topaz train in cryoSPARC, error with “Subprocess exited with status 1”

TedJ · August 3, 2022, 10:47pm

Are there other solutions to this problem? We are seeing topaz preprocessing jobs launched in Cryosparc make it part way through a dataset and then just hang. Even with <5000 images this is a problem. It looks like many subprocesses are launched but these end up in a wait status indefinitely.

Ted

olibclarke · August 4, 2022, 12:25am

Use less threads & CPUs. The defaults (8x8) make my system hang too, because they launch 64 subprocesses. I use 2x2 and that seems to work fine.

TedJ · August 4, 2022, 10:47pm

Thanks - that seems to be working well even with >5000 images. The threads/CPUs input is a bit unclear as many more subprocesses are being launched.

CleoShen · November 13, 2023, 1:11am

Using 2 threads and 2 CPUs on 1,431 micrographs failed as the same reason struct.error: unpack requires a buffer of 1024 bytes