Failed to launch! 255 Host key verification failed?

We’ve been running cryoSPARC with one master and one standalone worker. Yesterday we tried adding another standalone worker, but kept getting a “host key verification failed” message, with jobs refusing to run on the new worker. In an attempt to debug the problem, the PI deleted the cryosparc_user’s .ssh/known_hosts file (subsequently repeated by me on all 3 machines) and now we can’t get either standalone worker to take jobs: both give the “host verification failed” error message, or precisely:

"Command '['ssh', u'cryosparc_user@javelina.biosci.utexas.edu', u'bash -c "nohup /data1/local/home/cryosparc_user/cryosparc2_worker/bin/cryosparcw run --project P21 --job J256 --master_hostname kraken.biosci.utexas.edu --master_command_core_port 39002 > /EM/cryosparc/MorganGilman/P21/J256/job.log 2>&1 & "']' returned non-zero exit status 255"

and

Running job on remote worker node hostname javelina.biosci.utexas.edu
Failed to launch! 255
Host key verification failed.

As the cryosparc_user I can ssh freely between both standalone workers and the master, so apparently it’s referencing some other host key?

@spunjani @stephan Can you help us with this issue? We can’t run any jobs at the moment.

Hi @Jason, @pgoetz,

Can you confirm, if you log onto the master node kraken.biosci.utexas.edu as cryosparc_user and try to execute the command ssh cryosparc_user@javelina.biosci.utexas.edu bash -c "whoami", there is no error, or confirmation to verify the host?

Hi Stephan,

Many thanks for the response. Entering that command resulted in the following:

ssh cryosparc_user@javelina.biosci.utexas.edu bash -c “whoami”
The authenticity of host ‘javelina.biosci.utexas.edu (129.116.159.26)’ can’t be established.
ECDSA key fingerprint is <>.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘javelina.biosci.utexas.edu’ (ECDSA) to the list of known hosts.
cryosparc_user

Once the javelina.sci.utexas.edu was added to the list of known hosts, cryoSPARC jobs were able to be run on javelina. I ran a similar command for our new worker and now it works as well.

Thanks very much for helping us troubleshoot this issue. If it’s useful to others, perhaps this could be added to the installation instructions?

Best regards,
Jason

1 Like

Hi @Jason,

Awesome! We will add this to our site.

Thanks,

Stephan

1 Like

Hi,

I have tried this on the computer, however, I still get the same error when I try to run any of the jobs. Would you be able to advise on what could be done?

Kind regards,

Francesca

Please can you post outputs for the following commands:

cryosparcm status | grep MASTER_HOSTNAME
cryosparcm cli "get_scheduler_targets()"

Hello,

The outputs are

export CRYOSPARC_MASTER_HOSTNAME=“uol-ws-200051.leeds.ac.uk
[{‘cache_path’: ‘/not-backed-up/’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 47658106880, ‘name’: ‘NVIDIA A40’}, {‘id’: 1, ‘mem’: 47658106880, ‘name’: ‘NVIDIA A40’}], ‘hostname’: ‘localhost’, ‘lane’: ‘newlane’, ‘monitor_port’: None, ‘name’: ‘localhost’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]}, ‘ssh_str’: ‘fbsjfo@localhost’, ‘title’: ‘Worker node localhost’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/fbsjfo/cryosparc/cryosparc_worker/bin/cryosparcw’}]

In case

  • this CryoSPARC installation is intended as a “standalone”/“single workstation-type” instance and
  • there are are no plans to connect an additional worker

you may try:

  1. cryosparcm stop
  2. edit /path/to/cryosparc_master/config.sh (/home/fbsjfo/cryosparc/cryosparc_master/config.sh?):
    • change
      export CRYOSPARC_MASTER_HOSTNAME="uol-ws-200051.leeds.ac.uk"
      
      to
      export CRYOSPARC_MASTER_HOSTNAME="localhost"
      
    • add (or change in case of an existing definition of CRYOSPARC_HOSTNAME_CHECK)
      export CRYOSPARC_HOSTNAME_CHECK="localhost"
      
  3. cryosparcm start

Does this resolve the problem?

Hi @wtempel this has solved the issue. Thank you!