I’m trying to run a Topaz extract job and run into the following error. I used about 9k particles for Topaz train and it ran fine. I have a pretty large dataset (>13k images). I’m wondering if that’s what causing Topaz to fail. Any ideas? Maybe @alexjamesnoble has some input on this?
[ CPU: 226.7 MB] Traceback (most recent call last):
File "cryosparc2_worker/cryosparc2_compute/run.py", line 85, in cryosparc2_compute.run.main
File "cryosparc2_compute/jobs/topaz/run_topaz.py", line 1090, in run_topaz_wrapper_extract
File "cryosparc2_compute/jobs/topaz/topaz_utils.py", line 37, in run_process
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True, universal_newlines=newlines)
File "/home/vamsee/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/subprocess.py", line 394, in __init__
File "/home/vamsee/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/subprocess.py", line 1047, in _execute_child
OSError: [Errno 7] Argument list too long
Repeated the steps and was able to reproduce the error. Still unsure why this is happening. The Topaz extract jobs start fine and runs for a little bit too. It fails after a certain amount of time.
So, apparently, splitting the micrographs into 4 sets (~3500 each) is okay for Topaz to handle. Not sure what the upper limit is but it definitely fails after 13k images, probably much sooner.
Sorry for the delay. Judging by the Cryosparc traceback, this looks like a Cryosparc file handling issue, not a Topaz issue. If you run the Topaz command shown in the Cryosparc run (with proper changes to the micrographs list), does Topaz work?
This is caused by the command calling Topaz becoming too long due to the number of micrographs designated per thread. This is just a limitation of the subprocess module. There are a few ways to circumvent this issue:
- Split the dataset into splits using the
Exposure Sets Tool job and then infer from each of the splits.
- Create more threads to decrease the number of micrographs per thread. This can be done by increasing the
Number of parallel threads parameter. This may cause many threads to be created so if performance issues begin to arise, it is recommended to decrease the
Number of CPUs parameter accordingly.
No I haven’t tried doing what you suggested. I’ll give that a shot too and report back. As @jyoo suggested, it is a known limitation of the subprocess module. I was however able to split the dataset into 4 and Topaz extract worked like a charm.
Hi @jyoo - this is a frustrating error to encounter after running Topaz extract for a few hours. It should be possible for cryosparc to detect the number of input micrographs and split the dataset accordingly - or at least run a pre-check to determine the number of micrographs and fail before starting the job, no?
Hi @olibclarke, if it is of any help, I have had luck splitting into less than 5k micrographs generally. Anything above that seems iffy but 5k has worked every time.
This still happens in cryoSPARC 4.2.1.
Topaz can read paths to micrographs from a text file, which makes the argument list to the command much simpler (point topaz to the text file containing micrograph paths). This seems like something fixable in cryosparc, and it would be much more user-friendly than having to split the dataset.
Thanks @Guillaume for this suggestion. Do you have a link to documentation with details for this input mode?
Actually, according to the commands’ help messages, it seems that this feature only exists for the training part, not for picking, in topaz version 0.2.4:
$ topaz train --help
path to file listing the training images. also accepts
directory path from which all images are loaded.
$ topaz extract --help
paths paths to image files for processing
But in version 0.2.5 this help line says:
paths paths to image files for processing, can also be streamed from stdin
Not sure exactly what this means, and I don’t have version 0.2.5 installed to test this (I read the help strings from the GitHub repo). But this seems like there is a way other than passing all paths as command-line arguments (and hitting the limit from the shell).
Ah, I just found out that topaz now has a lot more documentation (than back when I first used it). This might be helpful to you, check the documentation for the
extract commands: Topaz Commands — Topaz 0.2.5 documentation