I am running standard 2D classification with multi-GPU threads using v2.13.2. The process went fine, but always got stuck in the last iteration.
In my most recent attempt, I used 2 GPUs and 40 iterations, the final output before stuck is
“[CPU: 27.72 GB] Start of Iteration 40”
“[CPU: 27.72 GB] – DEV 1 THR 0 NUM 11000 TOTAL 97.084745 ELAPSED 98.070966”
(I tried multitple times, but it always got stuck with this same place, NUM 11000.)
If I switch to single GPU, the run went into completion.
Any suggestions as what would be the problem here?
Thanks,
Pei