Hello,
I added a new worker node
cryosparc_worker/bin/cryosparcw connect --master grizzly.mskcc.org --port 39000 --worker $(hostname -f) --ssdpath /scratch/cryosparc_cache --gpus 0,1,2,3
---------------------------------------------------------------
CRYOSPARC CONNECT --------------------------------------------
---------------------------------------------------------------
Attempting to register worker polar.mskcc.org to command grizzly.mskcc.org:39002
Connecting as unix user sparc
Will register using ssh string: sparc@polar.mskcc.org
If this is incorrect, you should re-run this command with the flag --sshstr <ssh string>
---------------------------------------------------------------
Connected to master.
---------------------------------------------------------------
Current connected workers:
grizzly.mskcc.org
---------------------------------------------------------------
Worker will be registered with 56 CPUs.
Autodetecting available GPUs...
Detected 8 CUDA devices.
id pci-bus name
---------------------------------------------------------------
0 27 NVIDIA GeForce RTX 2080 Ti
1 28 NVIDIA GeForce RTX 2080 Ti
2 29 NVIDIA GeForce RTX 2080 Ti
3 30 NVIDIA GeForce RTX 2080 Ti
4 61 NVIDIA GeForce RTX 2080 Ti
5 63 NVIDIA GeForce RTX 2080 Ti
6 64 NVIDIA GeForce RTX 2080 Ti
7 65 NVIDIA GeForce RTX 2080 Ti
---------------------------------------------------------------
Devices specified: 0, 1, 2, 3
Devices 0, 1, 2, 3 will be enabled now.
This can be changed later using --update
---------------------------------------------------------------
Worker will be registered with SSD cache location /scratch/cryosparc_cache
---------------------------------------------------------------
Autodetecting the amount of RAM available...
This machine has 385.60GB RAM .
---------------------------------------------------------------
---------------------------------------------------------------
Registering worker...
Done.
You can now launch jobs on the master node and they will be scheduled
on to this worker node if resource requirements are met.
---------------------------------------------------------------
Final configuration for polar.mskcc.org
cache_path : /scratch/cryosparc_cache
cache_quota_mb : None
cache_reserve_mb : 10000
desc : None
gpus : [{'id': 0, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}]
hostname : polar.mskcc.org
lane : default
monitor_port : None
name : polar.mskcc.org
resource_fixed : {'SSD': True}
resource_slots : {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]}
ssh_str : sparc@polar.mskcc.org
title : Worker node polar.mskcc.org
type : node
worker_bin_path : /home/sparc/cryosparc_worker/bin/cryosparcw
---------------------------------------------------------------
It is seen in the scheduler
sparc@grizzly:~$ cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/scratch/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 4, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 5, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 6, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 7, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'grizzly.mskcc.org', 'lane': 'default', 'monitor_port': None, 'name': 'grizzly.mskcc.org', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43]}, 'ssh_str': 'sparc@grizzly.mskcc.org', 'title': 'Worker node grizzly.mskcc.org', 'type': 'node', 'worker_bin_path': '/home/sparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/scratch/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'polar.mskcc.org', 'lane': 'default', 'monitor_port': None, 'name': 'polar.mskcc.org', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]}, 'ssh_str': 'sparc@polar.mskcc.org', 'title': 'Worker node polar.mskcc.org', 'type': 'node', 'worker_bin_path': '/home/sparc/cryosparc_worker/bin/cryosparcw'}]
However, all jobs on it fail:
[2024-01-10 16:30:51.55]
License is valid.
[2024-01-10 16:30:51.56]
Launching job on lane default target polar.mskcc.org ...
[2024-01-10 16:30:51.60]
Running job on remote worker node hostname polar.mskcc.org
[2024-01-10 16:30:51.70]
Failed to launch! 255 Host key verification failed.
Please advise.
Thank you,
Yehuda