Setup info: Cluster managed by Slurm
Software: Cryosparc 4.1.2, Topaz 0.2.5
GPU: Tesla V100
Images:
Hi all,
I am currently attempting to run some Topaz training on a subset of my single particle cryo-EM dataset. The workflow I follow is:
- Prepare motion corrected, CTF estimated micrographs
- Run manual particle picking (using manual picker job)
- Extract chosen particles (using extract micrographs job)
- Feed particles and micrographs into a Topaz train job
- All parameters are left at default, except for expected number of particles
This works very well in some cases (e.g. trained in ~2 hours on a dataset of 328 micrographs), but in others (e.g. a dataset of 248 micrographs) seems to get stuck in preprocessing, with no apparent link to input micrograph quantity. When I investigated this, the final output log message is ‘Preprocessing over 8 processes…’, which can then proceed to hang for at least 12 hours (cancelled job at this point). When I looked at the job’s ‘preprocessed’ folder I can see that the job has produced 242/248 micrographs but then just hangs there. A subsequent re-run resulted in the job hanging at 214/248 micrographs. Otherwise, the job still shows ‘running’ and does not indicate any errors.
Has anyone encountered this before? Is there anyway this issue could be fixed or avoided?
Thanks for your help!
1 Like
There’s a really good chance you’re running out of CPU memory (RAM). Set CPUs to 1 or 2, and processes to 1 or 2. I also usually request about 8x the normal RAM for the job. CryoSPARC chronically under-reserves memory for all jobs on clusters, but it’s especially egregious with Topaz.
I have also experienced the discrepancy with micrograph quantity. Below a certain threshold, (say, 100) it works every time. Above a certain amount, it runs out of memory, but with no rhyme or reason. Sometimes 250 works, but 200 runs out of memory, for instance.
2 Likes
Hi,
I would suggest using less processes - e.g. 2 processes and 2 threads. In our hands the defaults end up spawning many processes which often seem to spin their wheels in a futile manner.
Cheers
Oli
1 Like
Increasing RAM and decreasing processes/threads to 2 has done the trick - thanks both! These kinds of bugs are incredibly difficult to debug!
I think the defaults should really be changed - on all of our systems 8 & 8 causes issues, not always but often enough that I think 2 & 2 would be much better as default values (@wtempel?)
1 Like
Thanks for the feedback. We will reduce the defaults for these parameters.
2 Likes
Hello all, we’ve changed the Topaz parameter defaults in the latest CryoSPARC v4.5 to 2 processes and 4 workers. We’ve also made sure the Topaz jobs request the correct CPU resources based on these parameters. Thanks again for your feedback!
2 Likes