Problems with Topaz Extract

shl4014 · July 17, 2023, 5:10pm

Hi,

I’ve been having a problem recently with Topaz Extract which I’m not able to resolve.

I have a dataset with 10k micrographs. I trained two different models on these data: Both use the same selected particles, but one model using only 100 micrographs from the full dataset for training (model1), and the 2nd model uses 1000 micrographs for training (model2).

When I try using Topaz Extract with model 2, the job works all the time, but when I use model 1, the job sometimes works and sometimes does not. Using model 1, I’ve tried running Topaz Extract with all micrographs, and with split subsets of the data. When using all micrographs the job fails, and when using split subsets (100 micrographs), the job sometimes works for some subsets, but for other subsets it fails. For the subsets that fail, I’ve tried filtering out bad micrographs with bad CTF parameters, but it doesn’t make a difference.

When the jobs fail, I get the following error:
[CPU: 223.5 MB]
Traceback (most recent call last):

[CPU: 223.5 MB]
File “/software/apps/topaz/0.2.5/bin/topaz”, line 8, in

[CPU: 223.5 MB]
sys.exit(main())

[CPU: 223.5 MB]
File “/software/apps/topaz/0.2.5/lib/python3.6/site-packages/topaz/main.py”, line 148, in main

[CPU: 223.5 MB]
args.func(args)

[CPU: 223.5 MB]
File “/software/apps/topaz/0.2.5/lib/python3.6/site-packages/topaz/commands/extract.py”, line 288, in main

[CPU: 223.5 MB]
for path,score,coords in nms_iterator(stream, radius, threshold, pool=pool):

[CPU: 223.5 MB]
File “/software/apps/topaz/0.2.5/lib/python3.6/site-packages/topaz/commands/extract.py”, line 79, in nms_iterator

[CPU: 223.5 MB]
for name,score,coords in pool.imap_unordered(process, scores):

[CPU: 223.5 MB]
File “/software/apps/topaz/0.2.5/lib/python3.6/multiprocessing/pool.py”, line 735, in next

[CPU: 223.5 MB]
raise value

[CPU: 223.5 MB]
ValueError: cannot reshape array of size 0 into shape (1,1023,1440)

I’m not sure if it’s relevant, but I should note that I use cryosparc 4.2.1, with both Topaz 0.2.4 and 0.2.5.
When I was using cryosparc 3.3.2, no such problem occurred.

Does anyone know what is the problem, and how can I solve it?

Thanks!
Shifra

wtempel · July 17, 2023, 5:23pm

Does Topaz Extract from all micrographs also fail when topaz is run on the command line, outside CryoSPARC?

shl4014 · July 17, 2023, 5:47pm

We tried but did not succeed running the job through the command line, so I cannot tell.
Not sure why, maybe we do not know how to run it through the command line, since I usually run it through the GUI.

eugene1 · July 17, 2023, 8:04pm

Hi wtempel,

I am on of the sysadmins helping Shifra to figure our this issue on our system. We did try running extract directly on a command line and here is what I see:

> /software/apps/topaz/0.2.4/bin/topaz extract --radius 25 --threshold -6 --up-scale 4 --assignment-radius -1 --min-radius 5 --max-radius 100 --step-radius 5 --num-workers 1 --device 0 --model ./model.sav -o ./out.txt  ../J19/imported/0*mrc
Traceback (most recent call last):
  File "/software/apps/topaz/0.2.4/bin/topaz", line 11, in <module>
    load_entry_point('topaz-em==0.2.4', 'console_scripts', 'topaz')()
  File "/software/apps/topaz/0.2.4/lib/python3.6/site-packages/topaz/main.py", line 148, in main
    args.func(args)
  File "/software/apps/topaz/0.2.4/lib/python3.6/site-packages/topaz/commands/extract.py", line 270, in main
    for name,score,coords in nms_iterator(stream, radius, threshold, pool=pool):
  File "/software/apps/topaz/0.2.4/lib/python3.6/site-packages/topaz/commands/extract.py", line 73, in nms_iterator
    for name,score,coords in pool.imap_unordered(process, scores):
  File "/software/apps/topaz/0.2.4/lib/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
RuntimeError: CUDA out of memory. Tried to allocate 5.71 GiB (GPU 0; 23.65 GiB total capacity; 17.34 GiB already allocated; 5.26 GiB free; 17.38 GiB reserved in total by PyTorch)

This is a very different error, but if you could help with this, that’d be great! Looks like we are trying to overallocate GPU memory, but I am not sure how to control this.

I’ve also tried exactly the same on a non-GPU node, but CPU-only run segfaults almost immediately:

> /software/apps/topaz/0.2.4/bin/topaz extract --radius 25 --threshold -6 --up-scale 4 --assignment-radius -1 --min-radius 5 --max-radius 100 --step-radius 5 --num-workers 1 --device 0 --model ./model.sav -o ./out.txt  ../J19/imported/0*mrc
CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Falling back to CPU.
Segmentation fault

At the moment I know almost nothing about cryospark, and only trying to reproduce problems that Shifra reports. But I’ll be happy to run any kind of test/experiment you suggest to nail this down.

Thanks,
Eugene.

wtempel · July 17, 2023, 9:27pm

Welcome to the forum @eugene1.
The CryoSPARC Guide includes some sections on the application’s architecture and system requirements that may help you with this and other issues that CryoSPARC users whom you support may experience.
Getting Topaz to run on the command line is a good first step toward eventually performing Topaz tasks thorough the CryoSPARC UI.
We, just as the Topaz developers, recommend installing Topaz inside a dedicated conda environment and to wrap Topaz to isolate its environment from any possibly interfering other conda environments. You may want to focus on v0.2.5, which appears to be the current version (as of July 2023).
How to avoid RuntimeError: CUDA out of memory. depends on the circumstances.
For testing on the command line, you may want to explore Topaz command line parameters (I am not familiar with those) or cluster workload manager options (for slurm: see Slurm, GPU, CGroups, ConstrainDevices - #3 by dchin - Discussion Zone - ask.CI).
For Topaz use through the CryoSPARC interface, once Topaz function has been confirmed on the command line, either the CryoSPARC-builtin scheduler or an external cluster workload manager should ensure that Topaz tasks are not “landing” on already busy GPUs.
Running on CPU tasks that could run on GPU is likely very slow.