When I run topaz train , it keeps showing “running” without any sign of completion.
If you run nvidia-smi in a terminal, is there any activity on the GPU? Any CPU activity on the process? Disk I/O? Any errors in the job log? Any errors in dmesg?
No errors in the job log and no errors in dmesg , it keeps showing “running” for more than 20 hours
I think you should unmark this as solved, as it isn’t… ![]()
Next thing to check is does Topaz run correctly outside of CryoSPARC? (Should have asked that at the start as well…)
When I’ve had Topaz fail inside CryoSPARC, it’s always failed outside of it as well (although I don’t use Topaz all that often, so further help will likely come from others jumping to help…
)
And as a side note… wow those GPUs are hot for zero load… ![]()
@pandagxp Were you able to resolve this issue? If you still experience the problem, please:
- confirm Topaz function outside CryoSPARC
- then run another Training job using the CryoSPARC wrapper job type for Topaz
- then post the output of this command
where you would replace P12, J34 with the project and job IDs, respectively, of the newly failed Topaz Train (via CryoSPARC wrapper) jobcryosparcm eventlog P12 J34
Hi, wtempel!
I’m running into the same issue—it took me 25 hours and the output of cryosparcm eventlog P12 J34 is as follows:
[Sat, 13 Sep 2025 09:22:00 GMT] License is valid.
[Sat, 13 Sep 2025 09:22:00 GMT] Launching job on lane 2080 target room-2080 ...
[Sat, 13 Sep 2025 09:22:00 GMT] Running job on master node hostname room-2080
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] Job J132 Started
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] Master running v4.7.1, worker running v4.7.1
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] Working in directory: /data3/kxj/CS-lxj/J132
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] Running on lane 2080
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] Resources allocated:
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] Worker: room-2080
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] CPU : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] GPU : [0]
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] RAM : [0]
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] SSD : False
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] --------------------------------------------------------------
[Sat, 13 Sep 2025 09:22:02 GMT] [CPU RAM used: 90 MB] Importing job module for job type topaz_train...
[Sat, 13 Sep 2025 09:22:06 GMT] [CPU RAM used: 255 MB] Job ready to run
[Sat, 13 Sep 2025 09:22:06 GMT] [CPU RAM used: 255 MB] ***************************************************************
[Sat, 13 Sep 2025 09:22:06 GMT] [CPU RAM used: 255 MB] Topaz is a particle detection tool created by Tristan Bepler and Alex J. Noble.
Citations:
- Bepler, T., Morin, A., Rapp, M. et al. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nat Methods 16, 1153-1160 (2019) doi:10.1038/s41592-019-0575-8
- Bepler, T., Noble, A.J., Berger, B. Topaz-Denoise: general deep denoising models for cryoEM. bioRxiv 838920 (2019) doi: https://doi.org/10.1101/838920
Structura Biotechnology Inc. and cryoSPARC do not license Topaz nor distribute Topaz binaries. Please ensure you have your own copy of Topaz licensed and installed under the terms of its GNU General Public License v3.0, available for review at: https://github.com/tbepler/topaz/blob/master/LICENSE.
***************************************************************
[Sat, 13 Sep 2025 09:22:08 GMT] [CPU RAM used: 274 MB] Starting Topaz process using version 0.2.5a...
[Sat, 13 Sep 2025 09:22:08 GMT] [CPU RAM used: 274 MB] Random seed used is 1221755056
[Sat, 13 Sep 2025 09:22:09 GMT] [CPU RAM used: 274 MB] --------------------------------------------------------------
[Sat, 13 Sep 2025 09:22:09 GMT] [CPU RAM used: 274 MB] Starting preprocessing...
[Sat, 13 Sep 2025 09:22:09 GMT] [CPU RAM used: 274 MB] Using a downsampling factor of 8
[Sat, 13 Sep 2025 09:22:09 GMT] [CPU RAM used: 274 MB] Starting micrograph preprocessing by running command /home/kxj/miniconda3/envs/topaz/bin/topaz preprocess --scale 8 --niters 200 --num-workers 8 -o /data3/kxj/CS-lxj/J132/preprocessed [7146 MICROGRAPH PATHS EXCLUDED FOR LEGIBILITY]
[Sat, 13 Sep 2025 09:22:09 GMT] [CPU RAM used: 274 MB] Preprocessing over 2 processes...
The first Topaz training job finished without issues, but the three additional training jobs and one extraction job I later launched all appear to run indefinitely. The four remaining jobs were launched only after the first Topaz training job had finished.
Here is my GPU usage:
The next one is my CPU usage:
And followed by my project status:
Edit:
All my Topaz processes appear to have been killed due to OOM.
room@room-2080:~$ sudo dmesg -T | grep -i 'killed process.*topaz'
Out of memory: Killed process 11843 (topaz) total-vm:9169448kB, anon-rss:5186988kB, file-rss:7512kB, shmem-rss:0kB, UID:1024 pgtables:10728kB oom_score_adj:0
.
.
.
CryoSPARC monitors the topaz “main” process.
If a child of the topaz “main” process is terminated, but the “main” process itself keeps running, the CryoSPARC job wrapping that topaz “main” process may continue to “run”, possibly without any actual/meaningful progress in topaz training.



