Simplifying TCP port assignments in shared environments

mhucsf · April 2, 2024, 5:22pm

We run Cryosparc on a shared HPC cluster with hundreds of total users, only some of whom run Cryosparc. Each user runs their own copy of Cryosparc.

We can’t guarantee that a port range will be available every time we run it. For example, a port in the default range 39000-39009 might be in use by a random non-Cryosparc user.

To partially address this, I have a shared spreadsheet in which users claim a port range. Each user signs up for a range of 10 ports. It’s still the honor system though. A non-Cryosparc user who doesn’t know about the spreadsheet can still use a port which conflicts with a Cryosparc user.

There’s an interesting utility called port4me which addresses this problem on shared clusters in the data science world. It works well with RStudio and Jupyter Notebook. When a user runs port4me, they’re returned a port that’s not in use. Example usage:

$ jupyter notebook --port "$(port4me)"

port4me can also assign more than one port, e.g.:

$ port4me --list=5
54242
4930
42139
14723
55707

I’m aware of the CRYOSPARC_BASE_PORT environment variable. I assume I can set this at runtime before starting Cryosparc. Using port4me to assign CRYOSPARC_BASE_PORT at runtime is a great step, but there’s still the problem of needing 10 sequential ports.

Can the Cyrosparc team think of how to use port4me with Cryosparc?
Can all 10 Cryosparc ports be specified at runtime?
What are all 10 Cryosparc ports used for? They don’t appear to be open all the time.

wtempel · April 3, 2024, 9:34pm

Thanks @mhucsf for your suggestion, which we noted down.
The use of the various ports is described in the guide.