Extending GPU capacity on server

diffracteD · June 3, 2022, 5:41pm

Hi,
I’m upgrading our lab server with more SSD scratch drive (lsscratch) and 8 more gpus (RTXA4000) in addition to previous 8 Geforce 1080 units.
The thing I’m wondering is if it’s possible to add those new gpus and new ssd scratch to the currently installed CS instance !!
If not then I probably have to make a new server separating the new gpus which would be less efficient and scattered for me.
Please advise.

Thanks

wtempel · June 3, 2022, 6:59pm

Am I understanding correctly that you will add add additional GPUs to an existing host such that a single host will have 16 GPUs?
With

do you mean “define an additional target” for the cryoSPARC scheduler?
If I understand all this correctly, you would need to ensure that the additional SSD is combined with existing cache storage (RAID0?) because a cryoSPARC scheduler target is limited to a single “--ssdpath”.

diffracteD · June 3, 2022, 7:47pm

Yes the current built will have 16 gpus (8 Geforce1080 and 8 RTXA4000).
Currently I have 2 RAID (RAID 6) system (Old one is where CS is already installed and new one being configured right now). I’m connecting them with a 10Ghz network switch.

wtempel · June 3, 2022, 9:36pm

I would like to clarify that my earlier comment

referred to your particle cache/scratch implementation, which is probably separate from any RAID6 array(s) your system may also have.
I have no experience with GPU servers that have 16 GPUs. But in principle, it should be possible to expand the GPU configuration of an existing scheduler target (of type “node”, not “cluster”) like this (running on the 16-GPU worker node):

Obtain the 'hostname' value for the relevant worker node using
cryosparcm cli "get_scheduler_targets()"
Run on the relevant worker node (single command):
/path/to/cryosparc_worker/bin/cryosparcw connect \
--update --master <master_hostname> --worker <worker_hostname> \
--gpus 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

I have not tested this on a server with this many GPUs.