Hardware recommendations for high density server

We want to purchase a system that would ideally let us run 16 jobs in parallel and would appreciate some advice. We are considering a server with 4 Nvidia A16 GPUs. One A16 is essentially 4x16GB GPUs in a single card. We are a bit limited in terms of server space so we are looking for a dense setup. With these GPUs we would potentially only need a single server. Does anybody have experience with this card? Would there be any drawbacks?

Our current setup:

  • cryosparc worker on a dedicated server with
    • 3 RTX8000 GPUs,
    • 384 GB RAM
    • 16TB RAID0 NVME SSD scratch
    • 2x Xeon(R) Gold 6244
  • The master runs on a separate VM
  • We use SLURM for scheduling jobs

With this setup using v3.2 we were able to run up to 9 jobs in parallel (depending on RAM requirements). Since upgrading to CS 4.1.2 we can only run 3 jobs in parallel otherwise CS processes seems to go into a deadlock. This might be an NVIDIA issue in combination with GPU sharding that only comes up in CS4, but we are not completely sure what is causing it (I will probably open a second thread for this). Even though we would not need to use GPU sharding on a 4x A16 GPU system we are a bit concerned about this.

Do others have (good or bad) experience running many parallel CS4 jobs on servers with more than 4 GPUs?

I would highly appreciate any further advice regarding RAM size, number of CPU cores etc. The hardware and system requirements (CryoSPARC Architecture and System Requirements - CryoSPARC Guide) list 64GB RAM per GPU. Is this still adequate regarding jobs like 3D flex refinement? On our current system these sometimes result in much higher memory peaks. Generally we allocate 3x the amount of RAM requested by cryosparc in the SLURM scheduler to prevent jobs frequently running out of memory.