Last I looked into it, it was possible to spoof multiple occurrences of the same workstation/node by way of unique hostname aliases in sshd_config. Each can be assigned non-overlapping GPUs at time of connection to avoid the most obvious conflict, but there didn’t seem to be a way to effect similar CPU/RAM accounting—cryosparcw script auto-detected everything onboard without an obvious avenue for user-control. Thinking it through, however, if the ratio of resources happens to be appropriately provisioned for most uses cases, then outside of RBMC, this may not be a problem in practice?
Worth mentioning that I never got as far as testing cache-handling—cryoSPARC will treat both instances as unique resources, which could pose a problem under certain conditions.
All of this was experimental and unsanctioned, of course.
I’d say “I’ll give it a go too and report back” but I’ve just done a big update run on our main processing servers and I’m not about to take one down again given the queue of things to run right now.
Cache collisions shouldn’t be a huge issue since each worker can be assigned a different directory (if using the same mount point) or even given an SSD each. Or just run with --nossd if running an all-SSD system…
Might be possible to do it just via /etc/hosts as well, rather than anything exotic with sshd*.
Hm. I have another system sitting on my desk which needs its big update as well, I could temporarily play musical chairs with another systems’ GPUs to experiment… OK, I think I’ll try that next chance I get.
* edit: Using dummy NICs if necessary should prevent any weirdness. OK, definitely going to try this.