Cluster Architecture

wspatl46 · March 13, 2024, 4:09pm

I’m new to Cryosparc but certainly not HPC Bioinformatics so feel free to get as technical as you would like in a response. I have some basic questions about a suitable architecture so please bear with me.

In perusing the Cryosparc Guide Section 5 - Shared Filesystem I see that there is a master node with supporting worker nodes all of which have access to shared storage at a peer level. Is this architecture required? Do the nodes need general internet access?

I ask because in my experience, HPC clusters are typically setup having a master node with direct attached storage that is made available to headless nodes using NFS with the nodes being located behind a network switch so that the workers are on a private network.

Is there a reason CryoSparc would not support this architecture? Do the worker nodes need general internet access ? Is it common for people to use their workstations as worker nodes in a Cryosparc cluster? That would seem odd since anything they might do interactively might compete with jobs? Thanks for your comments.

wtempel · March 13, 2024, 8:18pm

Welcome to the forum @wspatl46 .
To answer just a few of your questions:

The master node needs to be able to establish an outgoing connection to the CryoSPARC license server at https://get.cryosparc.com.

CryoSPARC workers do not need internet access.
One should not expose CryoSPARC master or worker nodes to incoming connections from the internet.

There are various ways of satisfying the shared project storage requirement, for example one could have an nfs server that exports project directories to the CryoSPARC master and all worker nodes.

wspatl46 · March 14, 2024, 12:10am

There are various ways of satisfying the shared project storage requirement, for example one could have an nfs server that exports project directories to the CryoSPARC master and all worker nodes.

Thanks. I’ll look into that. My general pattern is to avoid that scenario if the network is a commodity network. I tend to prefer an NFS mount exported from the master via fast switch over to the nodes because performance is generally faster or at least I have greater control over it. Obviously this assumes appropriate networking cards and storage connections are configured.

That said, you are simply telling me that the scenario described in the doc is fine. I’m just a bit wary of that should the NFS export be offered over a commodity network then I/O to the project directories could be slow or subject to contention for network bandwidth. Thanks,