255 Host key verification failed-revisit

So I had an earlier thread to this one but over the course time we have notice that it seems to pop up when more then one job is trying to access the GPU’s, we have 2 in the system and you can start a job on each one but when you try and queue up another job we get the following

License is valid.

Launching job on lane default target localhost …

Running job on remote worker node hostname localhost

Failed to launch! 255 Host key verification failed.

Not sure what is going on, problem with the queuing?

Len Thomas

Hi Len,

Please can you post the outputs of these commands on the CryoSPARC master host:

cryosparcm status | grep HOSTNAME
cryosparcm cli "get_scheduler_targets()"

Kind regards.
Wolfram

Hi Wolfram,

Here are the outputs.

export CRYOSPARC_MASTER_HOSTNAME=“spgpu”

[{‘cache_path’: ‘/ssd/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 25383469056, ‘name’: ‘NVIDIA GeForce RTX 4090’}, {‘id’: 1, ‘mem’: 25386352640, ‘name’: ‘NVIDIA GeForce RTX 4090’}], ‘hostname’: ‘spgpu’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘spgpu’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, ‘ssh_str’: ‘spuser@spgpu’, ‘title’: ‘Worker node spgpu’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/spshared/apps/cryosparc/cryosparc_worker/bin/cryosparcw’}, {‘cache_path’: ‘/ssd/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 25383469056, ‘name’: ‘NVIDIA GeForce RTX 4090’}, {‘id’: 1, ‘mem’: 25386352640, ‘name’: ‘NVIDIA GeForce RTX 4090’}], ‘hostname’: ‘localhost’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘localhost’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, ‘ssh_str’: ‘spuser@localhost’, ‘title’: ‘Worker node localhost’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/spshared/apps/cryosparc/cryosparc_worker/bin/cryosparcw’}]

Thank you for your assistance.

Len

In case the targets with hostnames sgpu and localhost, respectively, you may be able to resolve the problem by removing the entry for localhost with the command:

cryosparcm cli "remove_scheduler_target_node('localhost')"

That appears to have worked.

Thank you again.

Len

1 Like