Job launch error

Hi,

We’re having an odd problem launching jobs on one of our workers after a recent OS update. We have a single-master, multiple worker setup, and all machines are running CryoSPARC 4.2.1 on Ubuntu 22.04. When we try and launch a job on one specific machine, it hangs at the following stage:

License is valid.
Launching job on lane XXXXX target XXXXX.XXX.XXX
Running job on remote worker node hostname XXXXX.XXX.XXX

There is no further output. Looking at the metadata log, I see the following top-level error:

cryosparc_tools.cryosparc.command.Error: *** CommandClient: (http://XXXXXX:39002/api) URL Error [Errno -3] Temporary failure in name resolution

Running “cryosparcm log command_core” reveals nothing unusual. SSH connections from the master to worker work fine both with short and fully-specified addreses.

All other workers are configured in exactly the same way, yet jobs launch fine on them. Any help appreciated!

Please can you run these commands on the “failing” worker and on a “working” … :upside_down_face: worker

host XXXXXX
curl XXXXXX:39002

and compare their outputs between the workers?

Hi,
Yes - these give different results - on the failing worker I get a “host not found” error and the curl job does not return a result. With a little further digging it appears that the failing machine has picked up a different (incorrect) search domain. These are supposed to be automatically set by DNS, but for some reason it is not consistent between workers. If I manually add the full search domain, the problem is fixed.
Thanks for the help!