Failed to launch! 255 - upon change cryosparc_master_hostname and then update

I wanted to update cryosparc but cryosparcm backup gave:
ERROR: Re-run this command on the master node: oldname.
Alternatively, set CRYOSPARC_FORCE_HOSTNAME=true in cryosparc_master/config.sh to suppress this error.
If this error message is incorrect, set CRYOSPARC_HOSTNAME_CHECK to the correct hostname in cryosparc_master/config.sh.

We had moved the computer recently, so maybe it had a new name. I ran hostname -f and changed cryosparc_master_hostname to match newname ~/software/cryosparc/cryosparc_master/config.sh

Then cryosparcm backup gave a new error:
database: ERROR (spawn error)

I got this before and followed the same fix of kill all cryosparc processes from
ps -ax | grep cryosparc

Then cryosaprcm backup works.

Then update and UI seems ok, but first job gave:
License is valid.

Launching job on lane default target oldname …

Running job on remote worker node hostname oldname

Failed to launch! 255
ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
Permission denied, please try again.
ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
Permission denied, please try again.
ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

Can I just change the cryosparc_master_hostname back to the original hostname?

The cryosparc job gives:
Running job on remote worker node hostname oldname

This is a good first step, as long as you ensure that the new hostname is “stable” in that

  • the hostname does not change from reboot to reboot
  • other computers, such as additional CryoSPARC workers on this CryoSPARC instance correctly resolve the new hostname

You may need help from your network admins to ensure the aforementioned conditions.

I suspect the scheduler target records in your CryoSPARC database still include a record to the old hostname. To help us propose a resolution, please let us know:

  1. Is the new hostname “stable” (as defined above)?
  2. What is the output of the command
    cryosparcm cli "get_scheduler_targets()"
    
  3. the old hostname
  4. the new hostname as shown by the command
    hostname -f

Hi @wtempel and thanks for the quick response! Here are the answers:

  1. Is the new hostname “stable” (as defined above)?

I will have to check with sysadmin, but it has not changed since the initial check and update from earlier today. The name had previously been the same asdf and now has .mc.institution.edu added on after, (asdf.mc.institution.edu)

  1. What is the output of the command cryosparcm cli “get_scheduler_targets()”

With cryosparc running
cryosparcm cli “get_scheduler_targets()”

[{‘cache_path’: ‘/mnt/ssd-scratch/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 12631212032, ‘name’: ‘NVIDIA GeForce RTX 3080 Ti’}, {‘id’: 1, ‘mem’: 12639338496, ‘name’: ‘NVIDIA GeForce RTX 3080 Ti’}], ‘hostname’: ‘asdf’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘asdf’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, ‘ssh_str’: ‘user@asdf’, ‘title’: ‘Worker node asdf’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/user/software/cryosparc/cryosparc_worker/bin/cryosparcw’}]

  1. the old hostname

asdf

  1. the new hostname as shown by the command: hostname -f

asdf.mc.institution.edu

Thanks for posting this info.
If it turns out that asdf.mc.institution.edu is stable, you can avoid the necessity for an ssh connection and also avoid CRYOSPARC_FORCE_HOSTNAME=true (which one would want to avoid in the absence of “special” circumstances) by having a three-way match between

  • hostname -f output
  • $CRYOSPARC_MASTER_HOSTNAME
  • the target "hostname" value

You can achieve this by:

  1. deleting the outdated target:
    cryosparcm cli "remove_scheduler_target_node('asdf')"
    
  2. adding a target with the new hostname, ensuring correct master and worker hostnames and correct port number ($CRYOSPARC_BASE_PORT inside `cryosparc_master/config.sh) (guide)
    /home/user/software/cryosparc/cryosparc_worker/bin/cryosparcw connect --master asdf.mc.institution.edu --worker asdf.mc.institution.edu --port 99999 --ssdpath /mnt/ssd-scratch/cryosparc_cache
    

Thank you! I have an issue now with cryosparcm cli “get_scheduler_targets()” returns and “AssertionError: Nvidia driver version 470.74 is out of date”

Is this really just a CUDA update issue or have I messed up something else? I cannot run non-GPU job type “remove duplicates”.

Also, why is it connecting on port 39002 instead of 39000 and is this a big deal?

I ran
cryosparcm cli “remove_scheduler_target_node(‘asdf’)”

Then check effect with
cryosparcm cli “get_scheduler_targets()”

I assume this means it worked as the previous text is cleared

Then
/home/user/software/cryosparc/cryosparc_worker/bin/cryosparcw connect --master asdf.institution.edu --worker asdf.institution.edu --port 39000 --ssdpath /mnt/ssd/cryosparc_cache

The output is below:


CRYOSPARC CONNECT --------------------------------------------

Attempting to register worker asdf.institution.edu to command asdf.institution.edu:39002
Connecting as unix user user
Will register using ssh string: user@asdf.institution.edu
If this is incorrect, you should re-run this command with the flag --sshstr

Connected to master.

Current connected workers:

Worker will be registered with 64 CPUs.
Autodetecting available GPUs…
Traceback (most recent call last):
File “bin/connect.py”, line 233, in
gpu_devidxs = check_gpus()
File “bin/connect.py”, line 97, in check_gpus
assert correct_driver_version is None, (
AssertionError: Nvidia driver version 470.74 is out of date and will not work with this version of CryoSPARC. Please install version 520.61.05 or newer.

This is expected. The software added +2 to the CRYOSPARC_BASE_PORT to identify the command_core port of your CryoSPARC installation.

Please update the nvidia driver of the asdf computer to version 520.61.05 or newer and reboot the computer after the update.
After the reboot, please record and post the outputs of these commands:

cryosparcm cli "get_scheduler_targets()"
/home/user/software/cryosparc/cryosparc_worker/bin/cryosparcw gpulist

I will downgrade to perform some work and backup before updating nvidia drivers. Are there any considerations I should take before performing
cryosparcm update --version=v4.2.1

I have not tested such a downgrade, but I would expect that, if the CryoSPARC instance was never at a version below 4.4, you will need

  1. an independent installation of the CUDA toolkit version 11.x
  2. a definition inside
    export CRYOSPARC_CUDA_PATH=/your/path/to/cuda
    
    such that
    /your/path/to/cuda such that
    /your/path/to/cuda/bin/nvcc
    exists.

It was previously at v4.2.1 with Nvidia 470.74 and running smoothly.

I expect I could downgrade and then update the hostnames as outlined above and hope it gets back to running.