Topaz Train OOM on A100 80GB with very small dataset (cryoSPARC 4.5.3, new installed Topaz 0.2.5, A100 GPU 80 GB))

Hi everyone,

I installed Topaz following the tutorial. However, when I try to run a Topaz Train job in cryoSPARC, I encounter the following problem. I have attached the parameters I used and a screenshot of the error.

Given the small dataset (only 7 micrographs) and the very low training settings, the request for ~140 GB of GPU memory seems abnormal.

Is there a known issue in the Topaz wrapper or dataset tiling that could cause over-aggressive pre-allocation on the GPU?

Any suggestions or insights would be greatly appreciated.

Best,
Junqing

Just to add — I followed the installation steps exactly from the official cryoSPARC tutorial:
https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/deep-picking/topaz#python-environment

Hi @jkang3,

This is definitely abnormal as 7 micrographs and particles for those micrographs shouldn’t be anywhere near 140 GB. Have you used Topaz before on other datasets or on another lane/node? Are you using SLURM?

Best,

Kye

Hi Kye,

Thank you for your reply.
I have not used Topaz before — this is my first time setting it up following the CryoSPARC tutorial.
I am running it on my own server, not using SLURM.

Best,
Junqing

Hi @jkang3,

Can you try to execute the Topaz train command from the command line? The job log should contain the full command to use.

Thanks,

Kye

Hi Key,
Here is the log, when I tried to execute the Topaz train command from the command line.

Thank you
Best,
Junqing

Hi @jkang3,

Based on this error’s presence when running from the command line, we would kindly direct you to the Topaz Github page for further support as we do not support failures with external software packages.

Best,

Kye