Cluster setup - queue job to a specific NODE

TLDR:

  • Am I understanding this?
  • Can I at least create a way to submit a job to a specific NODE.

We have an older cluster system where people can submit jobs either by using “queue to lane” or “run on specific GPU”. We just got a new cluster, where the only option is “queue to lane” and the people are begging me to give them the option to “run on specific GPU”.

Based on the explanation given here – it would seem that when our older cluster was set up, it was set up using BOTH the “cryosparcm cluster connect” to create the “queue to lane”, AND the individual nodes were also separately “connected” using “cryosparcw connect”. In addition, the “cryosparcuser” account must have the ability to SSH into the individual nodes. HOWEVER by doing this, it would seem this might cause issues as the scheduler (slurm) does not know about these jobs submitted to a “specific GPU” and could possibly schedule something to run when the resources are not available. Am I understanding that correctly?

IF that is correct, could I appease them by at least giving them the option to pick a specific NODE to run jobs on? From the comment given here, it might seem like I can:

------------------8<—Begin-Cut-Here—8<------------------

-------------------8<—End-Cut-Here—8<-------------------

If so, would I do this by following the “Add additional Cluster Lanes”? ALTHOUGH, looking at my existing cluster_info.json & cluster_script.sh files, I don’t see a way to specify in there a specific node… So this just looks like it would create new lanes all with the same specs but different names…

Hi,

Yes, this sounds correct to me.

A simple solution is perhaps to bake a -w/--nodelist argument into cluster_script.sh with a custom variable to allow your users to define, optionally, a node at the point of job submission.

e.g.

...
{%- if custom_node %}
#SBATCH --nodelist={{ custom_node }}
{%- endif %}
...

If node hostnames are prohibitively cumbersome to enter, you could also include a dictionary that maps them to more user-friendly aliases. See this thread for an example.

The downside to the implementation is that cluster usage will be relatively opaque from within cryoSPARC. Users won’t know which nodes already have jobs running on them. An alternative would be to have one lane for each node, this time using explicitly the -w/--nodelist argument.

Cheers,
Yang

Thank you for the quick reply Yang.

We have a very small cluster (4 nodes) so I ended up making a lane for each node - as mentioned here “Add additional Cluster Lanes”. I made 4 separate directories and copied cluster_script.sh & cluster_info.json into each of them. I then modified each cluster_script.sh to point to the appropriate node:
#SBATCH --nodelist=node001
And modified the corresponding cluster_info.json to update the “name” and “title” with the appropriate node name. And then of course ran “cryosparcm cluster connect” within each directory.

So far, they seem to be happy with this setup. Thanks again for the help and pointing out that I could put the “-w/–nodelist” in the cluster_script.sh.

1 Like