How to use all 4 GPUs?

xzhang2017 · September 22, 2018, 1:07pm

My workstation has 4 GPUs, but when lauched multi jobs, only 2 jobs get running, and the third job is only queued and waiting. “nvidia-smi” shows only two GPUs are running; while “cryosparcw gpulist” shows 4 GPUs are detected. How to use the two idle GPUs? Thanks a lot.

Bests,
Xing

jucastil · September 24, 2018, 9:38am

Hello Xing,
This looks like a driver problem. “nvidia-smi” is lower level than cryoSPARC, so I guess something is wrong with two of your GPUs. Are all your GPUs of the same type? Which driver version you run?

xzhang2017 · September 24, 2018, 10:48am

The four GPUs are identical, and they are all working fine on cryoSPARC_v1 and relion.

The problem may be related to insufficient number of CPUs in which we only have 8 cores but each GPU job needs 4 CPUs, so only GPU jobs are running. Any solution for this issue? Thanks a lot.

Bests,
Xing

jucastil · September 24, 2018, 11:16am

Super, it is good you have the same GPUs.
I had troubles with drivers running over different GPUs…I wanted one for graphics, one for calculations, but it didn’t work.

Can you maybe switch on hyperthreading on your server?
If you reboot your server, there may be an option on your BIOS to do that.
In that way, you may end up with 16 cores ==> problem solved. If that was the problem.
If you can run relion and cryoSPARC v1 with the 4 GPUs but not cryoSPARC v2, the bug may be somewhere else.

xzhang2017 · September 27, 2018, 12:48pm

YES, turn on hyperthreading on BIOS solved the problem, and I now could 4 jobs running on 4 GPUs. Thanks.

Bests,
Xing

jucastil · September 27, 2018, 1:12pm

I’m happy I could help. Don’t forget to close the topic

jmh · November 5, 2018, 6:06pm

I’m having a similar problem, but in my case hyperthreading is already turned on and doesn’t solve the problem.

The system is an 8-core i7-6900K with 4 GTX-1080’s running RHEL7. Hyperthreading is on for all 8 cores. For cryoSPARC, I’m using cuda-8.0, since the installation instructions say that’s the latest officially supported. (The system actually has all versions up to 10.0 installed). The Nvidia driver version is 410.73. CryoSPARC master and worker are V2.4.0.

I can use 2 gpus with no problem, but trying to use 3 or 4 in a multi-gpu job reults in the job to remaining permanently queued waiting for cpu (not gpu) resources. Frequently, the gui also locks up as well and you can’t even clear the job until you restart the browser. Note that I’m trying to use multiple cpus from a single job, not multiple single-gpu jobs.

Any suggestions?

Thanks in advance!

jucastil · November 6, 2018, 10:07am

I have no suggestion here
We have a system with a similar configuration (1080’s + intel) but with 16 cores. I’m going to say indeed you don’t have enough CPUs. I think you need 2 CPUs per GPU, plus one to manage the process = 3. And don’t forget you need to leave some space for OS tasks also. That’s why you get a freeze out.
So on a 8-core I could place 3 GPUs only.
It may work for different jobs because they are “asyncronous” and we are not really speaking about CPUs and GPUs but “processing loads”. It happened to us that a job crashed due to “external load” by other EM programs…
I hope this helps!