Hardware recommendations for high density server

We want to purchase a system that would ideally let us run 16 jobs in parallel and would appreciate some advice. We are considering a server with 4 Nvidia A16 GPUs. One A16 is essentially 4x16GB GPUs in a single card. We are a bit limited in terms of server space so we are looking for a dense setup. With these GPUs we would potentially only need a single server. Does anybody have experience with this card? Would there be any drawbacks?

Our current setup:

  • cryosparc worker on a dedicated server with
    • 3 RTX8000 GPUs,
    • 384 GB RAM
    • 16TB RAID0 NVME SSD scratch
    • 2x Xeon(R) Gold 6244
  • The master runs on a separate VM
  • We use SLURM for scheduling jobs

With this setup using v3.2 we were able to run up to 9 jobs in parallel (depending on RAM requirements). Since upgrading to CS 4.1.2 we can only run 3 jobs in parallel otherwise CS processes seems to go into a deadlock. This might be an NVIDIA issue in combination with GPU sharding that only comes up in CS4, but we are not completely sure what is causing it (I will probably open a second thread for this). Even though we would not need to use GPU sharding on a 4x A16 GPU system we are a bit concerned about this.

Do others have (good or bad) experience running many parallel CS4 jobs on servers with more than 4 GPUs?

I would highly appreciate any further advice regarding RAM size, number of CPU cores etc. The hardware and system requirements (CryoSPARC Architecture and System Requirements | CryoSPARC Guide) list 64GB RAM per GPU. Is this still adequate regarding jobs like 3D flex refinement? On our current system these sometimes result in much higher memory peaks. Generally we allocate 3x the amount of RAM requested by cryosparc in the SLURM scheduler to prevent jobs frequently running out of memory.

I haven’t used A16’s, but could work.

If you’re aiming for 9 jobs in parallel, Gigabyte and some others sell some 2U servers with 8-16 GPUs. A4000’s(active cooling but struggles a bit) or L4’s(passive cooling) seem like reasonable single slot cards. I would be nervous about 16 GPUs, but 8 GPUs in 2U of these more power efficient cards could be doable.

Master Node:
With heavy use on multiple workers, I had issues with 32GB RAM. However, that was with older versions of cryosparc. I use 256GB RAM now.

Worker Nodes:
Memory:
64GB per GPU will work for a lot of jobs, but it is definitely not enough for all jobs.

On the hardware side, I aim for 128GB RAM per GPU. By default I have cryosparc doing a SLURM request of ~64GB of RAM per GPU. If the job dies, the user it is instructed to request more through the cryosparc GUI. Sometimes a GPU needs to be idle because someone needs the memory for another GPU’s job, but a lot of jobs can still run in parallel.

CPU:
4 cores per GPU sounds right. Higher CPU clock speed will make a difference and is probably more worth it than higher TFLOPs GPUs.

Hope this helps!