Hopefully I’ve chosen the right topic. My understanding is that while we can use multiple GPUs on a single node, we cannot use multiple GPUs on different nodes. Could someone please confirm or deny this statement?
We have several GPU nodes that have just 2 GPUs, so we were interested in using multiple GPU nodes for running cryoSPARC jobs.
Multinode operation requires not only using a GPU but also a synchronisation on the CPU side, between processes on different nodes using MPI or some other solution. Implementing something like this would require the hell of a development effort with high probability of failure in terms of performance achieved; endovours like this are most often viable only when your can scale to many (like dozens) nodes, which is certainly not the case in CS. I don’t think that devs would try to do that and that’s probably a good decision.