Topaz train is always running

When I run topaz train , it keeps showing “running” without any sign of completion.

If you run nvidia-smi in a terminal, is there any activity on the GPU? Any CPU activity on the process? Disk I/O? Any errors in the job log? Any errors in dmesg?

No errors in the job log and no errors in dmesg , it keeps showing “running” for more than 20 hours

I think you should unmark this as solved, as it isn’t… :wink:

Next thing to check is does Topaz run correctly outside of CryoSPARC? (Should have asked that at the start as well…)

When I’ve had Topaz fail inside CryoSPARC, it’s always failed outside of it as well (although I don’t use Topaz all that often, so further help will likely come from others jumping to help… :wink: )

And as a side note… wow those GPUs are hot for zero load… :astonished:

@pandagxp Were you able to resolve this issue? If you still experience the problem, please:

  1. confirm Topaz function outside CryoSPARC
  2. then run another Training job using the CryoSPARC wrapper job type for Topaz
  3. then post the output of this command
    cryosparcm eventlog P12 J34
    
    where you would replace P12, J34 with the project and job IDs, respectively, of the newly failed Topaz Train (via CryoSPARC wrapper) job