Adding a worker

Hello,

I added a new worker node

cryosparc_worker/bin/cryosparcw connect --master grizzly.mskcc.org --port 39000 --worker $(hostname -f) --ssdpath /scratch/cryosparc_cache --gpus 0,1,2,3
 ---------------------------------------------------------------
  CRYOSPARC CONNECT --------------------------------------------
 ---------------------------------------------------------------
  Attempting to register worker polar.mskcc.org to command grizzly.mskcc.org:39002
  Connecting as unix user sparc
  Will register using ssh string: sparc@polar.mskcc.org
  If this is incorrect, you should re-run this command with the flag --sshstr <ssh string> 
 ---------------------------------------------------------------
  Connected to master.
 ---------------------------------------------------------------
  Current connected workers:
    grizzly.mskcc.org
 ---------------------------------------------------------------
  Worker will be registered with 56 CPUs.
  Autodetecting available GPUs...
  Detected 8 CUDA devices.

   id           pci-bus  name
   ---------------------------------------------------------------
       0                27  NVIDIA GeForce RTX 2080 Ti                                                                
       1                28  NVIDIA GeForce RTX 2080 Ti                                                                
       2                29  NVIDIA GeForce RTX 2080 Ti                                                                
       3                30  NVIDIA GeForce RTX 2080 Ti                                                                
       4                61  NVIDIA GeForce RTX 2080 Ti                                                                
       5                63  NVIDIA GeForce RTX 2080 Ti                                                                
       6                64  NVIDIA GeForce RTX 2080 Ti                                                                
       7                65  NVIDIA GeForce RTX 2080 Ti                                                                
   ---------------------------------------------------------------
   Devices specified: 0, 1, 2, 3
   Devices 0, 1, 2, 3 will be enabled now.
   This can be changed later using --update
 ---------------------------------------------------------------
  Worker will be registered with SSD cache location /scratch/cryosparc_cache 
 ---------------------------------------------------------------
  Autodetecting the amount of RAM available...
  This machine has 385.60GB RAM .
 ---------------------------------------------------------------
 ---------------------------------------------------------------
  Registering worker...
  Done.

  You can now launch jobs on the master node and they will be scheduled
  on to this worker node if resource requirements are met.
 ---------------------------------------------------------------
  Final configuration for polar.mskcc.org
               cache_path :  /scratch/cryosparc_cache
           cache_quota_mb :  None
         cache_reserve_mb :  10000
                     desc :  None
                     gpus :  [{'id': 0, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}]
                 hostname :  polar.mskcc.org
                     lane :  default
             monitor_port :  None
                     name :  polar.mskcc.org
           resource_fixed :  {'SSD': True}
           resource_slots :  {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]}
                  ssh_str :  sparc@polar.mskcc.org
                    title :  Worker node polar.mskcc.org
                     type :  node
          worker_bin_path :  /home/sparc/cryosparc_worker/bin/cryosparcw
 ---------------------------------------------------------------

It is seen in the scheduler

sparc@grizzly:~$ cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/scratch/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 4, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 5, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 6, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 7, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'grizzly.mskcc.org', 'lane': 'default', 'monitor_port': None, 'name': 'grizzly.mskcc.org', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43]}, 'ssh_str': 'sparc@grizzly.mskcc.org', 'title': 'Worker node grizzly.mskcc.org', 'type': 'node', 'worker_bin_path': '/home/sparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/scratch/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'polar.mskcc.org', 'lane': 'default', 'monitor_port': None, 'name': 'polar.mskcc.org', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]}, 'ssh_str': 'sparc@polar.mskcc.org', 'title': 'Worker node polar.mskcc.org', 'type': 'node', 'worker_bin_path': '/home/sparc/cryosparc_worker/bin/cryosparcw'}]
However, all jobs on it fail:
[2024-01-10 16:30:51.55]

License is valid.

[2024-01-10 16:30:51.56]

Launching job on lane default target polar.mskcc.org ...

[2024-01-10 16:30:51.60]

Running job on remote worker node hostname polar.mskcc.org

[2024-01-10 16:30:51.70]

Failed to launch! 255 Host key verification failed.

Please advise.
Thank you,
Yehuda

@Yehuda Have you already checked if the host key of the polar server needs to be manually confirmed?
What happens when you run on grizzly:

ssh sparc@polar.mskcc.org

?

Thank you. I confirmed that I can ssh as sparc user. Additional details: I installed cryosparc with --standalone flag.

Thanks for trying this. I wonder what exactly is shown on the terminal when you run on grizzly:

ssh sparc@polar.mskcc.org hostname -f

Please can you post the output, starting with the command prompt.

Hmm… Looks like polar but but not polar.mskcc.org was in known_hosts. Thank you, the topic can be closed.

1 Like