We recently purchased two workstations with ryzen threadripper 3960x processor. Both of them have two Nvidia GEFORCE RTX 3090 cards, 256 GB RAM on a AsRock TRX40D8-2N2T board. The power supply unit has a capacity of 1500 Watts.
Now, while running high GPU demanding jobs, like refinement, on both of the GPUs in Cryosparc our workstation is getting crashed endlessly. Although when we are using only one GPU out of the two GPUs or in other words while running only one refinement job in Cryosparc, the workstation works fine and finishes the job.
The problem is consistent in both of the workstations we have. It would be a great help if someone can suggest a possible solution.
Please let me know if you want me to include any specific detail about the workstation we have.
When you say crashed, what do you mean exactly - the jobs crash, the workstation restarts…?
The reason I ask is we have a very similarly configured workstation (2x3090, threadripper, 1500W PSU) that was exhibiting similar problems. When under high load (often two local refinement jobs would trigger it), it was totally powering down.
In our case, it seems (crossing fingers here!) to have been an issue of the placement of the SSDs. We have two SSDs, one for scratch and one for the OS. the OS SSD was located directly under one of the GPUs, and I guess was getting too hot. The manufacturer suggested it moving down one slot, and it has now been running smoothly over the weekend with no crashes, which is the first time for a while. Hope that helps!
Thank you very much for your response. I am also having the same problem, I mean our workstations are getting completely dead. Sometimes we need to take off the power cord and push the power button for 30 seconds or so to start to workstation again. I am not very sure about the placements of the discs. We will check it and try your suggestions.
Could you easily measure the power draw from the GPUs when things crash? The upper end 30-series cards are known to have very large spikes that have caused crashing, so maybe that’s it? 1000 Watt PSUs with single GPU configurations have had problems sometimes I think. Example post of many:
It sounds exactly like the issue we had - in our case it did not seem to be power draw that was the issue (both cards maxed out at ~350W), moving the OS SSD down one slot seems to have fixed it. Good luck!!
We also had such an issue with a very similar setup. Swapping SSD locations helped temporarily, but the issue recurred. We tried a lot of things shipping the computer back and forth with the manufacturer, so it’s hard to say what was the key remedy, but I suspect it was upgrading to a 2000 W PSU.
Check the back of the power supply for a sticker showing the 12 volt power. GPUs run on 12 volt power and 1500W doesn’t always mean 1500 available over 12 volt. (amps * volts = watts, if watts aren’t listed directly). The 3090s, depending on model, can hit close to 400 watts if I remember correctly.