My workstation has 4 GPUs, but when lauched multi jobs, only 2 jobs get running, and the third job is only queued and waiting. “nvidia-smi” shows only two GPUs are running; while “cryosparcw gpulist” shows 4 GPUs are detected. How to use the two idle GPUs? Thanks a lot.
This looks like a driver problem. “nvidia-smi” is lower level than cryoSPARC, so I guess something is wrong with two of your GPUs. Are all your GPUs of the same type? Which driver version you run?
Super, it is good you have the same GPUs.
I had troubles with drivers running over different GPUs…I wanted one for graphics, one for calculations, but it didn’t work.
Can you maybe switch on hyperthreading on your server?
If you reboot your server, there may be an option on your BIOS to do that.
In that way, you may end up with 16 cores ==> problem solved. If that was the problem.
If you can run relion and cryoSPARC v1 with the 4 GPUs but not cryoSPARC v2, the bug may be somewhere else.
I’m having a similar problem, but in my case hyperthreading is already turned on and doesn’t solve the problem.
The system is an 8-core i7-6900K with 4 GTX-1080’s running RHEL7. Hyperthreading is on for all 8 cores. For cryoSPARC, I’m using cuda-8.0, since the installation instructions say that’s the latest officially supported. (The system actually has all versions up to 10.0 installed). The Nvidia driver version is 410.73. CryoSPARC master and worker are V2.4.0.
I can use 2 gpus with no problem, but trying to use 3 or 4 in a multi-gpu job reults in the job to remaining permanently queued waiting for cpu (not gpu) resources. Frequently, the gui also locks up as well and you can’t even clear the job until you restart the browser. Note that I’m trying to use multiple cpus from a single job, not multiple single-gpu jobs.
I have no suggestion here
We have a system with a similar configuration (1080’s + intel) but with 16 cores. I’m going to say indeed you don’t have enough CPUs. I think you need 2 CPUs per GPU, plus one to manage the process = 3. And don’t forget you need to leave some space for OS tasks also. That’s why you get a freeze out.
So on a 8-core I could place 3 GPUs only.
It may work for different jobs because they are “asyncronous” and we are not really speaking about CPUs and GPUs but “processing loads”. It happened to us that a job crashed due to “external load” by other EM programs…
I hope this helps!