Dear Developers,
We are building the cluster with 20-30 worker nodes (200-300 GPUs; slurm). What would be the optimal hardware specs for the cryosparc master node? (We expect to have 50+ users, and the cluster will be heavily used.)
Thank you,
Sergei
I’m not a dev, but I admin a cluster setup of CS. Do you plan to use single master instance for all users?
We use a scheme of a seperate instances for each user or research group (usually < 5 users) and we reserve what is stated as minimum requirements for CS master in terms of CPU and RAM (4 cores and 16GB) for each instance and is usually more than enough. I haven’t seen an OOM issue for a long time. If you multiply it by 50, you get 200 cores and 800G of RAM, but you can certainly oversubscribe on CPU and probably also on RAM.
So I gues a single two socket server with 768GB or 1TB of RAM would serve you very well with quite a headroom and I would risk a statemant that you can get along with a single modern 64C/128T CPU in most cases.
I’m not sure how well CS scales to 50 users on a single instance, especially in terms of DB and it’s performance.
In both cases you need a high I/O NVMe storage for database/databases which can be in tens of GBs per user.
Thank you for navigating me! At present, we plan to use a single instance for all users. It is much easier to manage. We never run a 200-GPU cluster before, so I am unsure about DB performance. We have to try.
One problem to keep in mind if (I don’t know if it is your case) you manage environment for multiple groups of researchers are updates. It’s often needed for them to keep using a certain version of software until the end of the project to ensure results consistency. If you put them on a single large instance, you don’t have the flexibility in that regard.
We are one large group of researches, and we use the same software versions for consistency. We also sometimes assign more than one person for a project. So, having one CS instance is convenient for us.