255 Host key verification failed-revisit

lmthomas · October 22, 2025, 10:26pm

So I had an earlier thread to this one but over the course time we have notice that it seems to pop up when more then one job is trying to access the GPU’s, we have 2 in the system and you can start a job on each one but when you try and queue up another job we get the following

License is valid.

Launching job on lane default target localhost …

Running job on remote worker node hostname localhost

Failed to launch! 255 Host key verification failed.

Not sure what is going on, problem with the queuing?

Len Thomas

wtempel · October 23, 2025, 2:44pm

Hi Len,

Please can you post the outputs of these commands on the CryoSPARC master host:

cryosparcm status | grep HOSTNAME
cryosparcm cli "get_scheduler_targets()"

Kind regards.
Wolfram

lmthomas · October 23, 2025, 3:37pm

Hi Wolfram,

Here are the outputs.

export CRYOSPARC_MASTER_HOSTNAME=“spgpu”

[{‘cache_path’: ‘/ssd/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 25383469056, ‘name’: ‘NVIDIA GeForce RTX 4090’}, {‘id’: 1, ‘mem’: 25386352640, ‘name’: ‘NVIDIA GeForce RTX 4090’}], ‘hostname’: ‘spgpu’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘spgpu’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, ‘ssh_str’: ‘spuser@spgpu’, ‘title’: ‘Worker node spgpu’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/spshared/apps/cryosparc/cryosparc_worker/bin/cryosparcw’}, {‘cache_path’: ‘/ssd/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 25383469056, ‘name’: ‘NVIDIA GeForce RTX 4090’}, {‘id’: 1, ‘mem’: 25386352640, ‘name’: ‘NVIDIA GeForce RTX 4090’}], ‘hostname’: ‘localhost’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘localhost’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, ‘ssh_str’: ‘spuser@localhost’, ‘title’: ‘Worker node localhost’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/spshared/apps/cryosparc/cryosparc_worker/bin/cryosparcw’}]

Thank you for your assistance.

Len

wtempel · October 23, 2025, 4:34pm

In case the targets with hostnames sgpu and localhost, respectively, you may be able to resolve the problem by removing the entry for localhost with the command:

cryosparcm cli "remove_scheduler_target_node('localhost')"

lmthomas · October 23, 2025, 7:33pm

That appears to have worked.

Thank you again.

Len