Cryosparc live "URL Error [Errno 113] No route to host"

I have cryosparc master/worker setup. The master is a linux machine with no GPUs and I have 5 GPU nodes setup as workers. I can run cryosparc normally/manually with no issues. when i try to run cryosparc live I get the following error…
Traceback (most recent call last):
File “/home/emproc/cryosparc3-1/cryosparc_worker/cryosparc_tools/cryosparc/command.py”, line 104, in func
with make_json_request(self, “/api”, data=data) as request:
File “/home/emproc/cryosparc3-1/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/contextlib.py”, line 113, in enter
return next(self.gen)
File “/home/emproc/cryosparc3-1/cryosparc_worker/cryosparc_tools/cryosparc/command.py”, line 191, in make_request
raise CommandClient.Error(client, error_reason, url=url)
cryosparc_tools.cryosparc.command.CommandClient.Error: *** CommandClient: (http://krios.csb.vanderbilt.edu:45005/api) URL Error [Errno 113] No route to host

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 96, in cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/rtp_workers/run.py”, line 313, in cryosparc_compute.jobs.rtp_workers.run.rtp_worker
File “/home/emproc/cryosparc3-1/cryosparc_worker/cryosparc_compute/jobs/rtp_workers/rtp_common.py”, line 23, in get_rtp_cli
rtp = client.CommandClient(sysinfo[‘master_hostname’], int(sysinfo[‘port_command_rtp’]), service=“command_rtp”)
File “/home/emproc/cryosparc3-1/cryosparc_worker/cryosparc_compute/client.py”, line 36, in init
super().init(service, host, port, url, timeout, headers, cls=NumpyEncoder)
File “/home/emproc/cryosparc3-1/cryosparc_worker/cryosparc_tools/cryosparc/command.py”, line 91, in init
self._reload() # attempt connection immediately to gather methods
File “/home/emproc/cryosparc3-1/cryosparc_worker/cryosparc_tools/cryosparc/command.py”, line 118, in _reload
system = self._get_callable(“system.describe”)()
File “/home/emproc/cryosparc3-1/cryosparc_worker/cryosparc_tools/cryosparc/command.py”, line 107, in func
raise CommandClient.Error(
cryosparc_tools.cryosparc.command.CommandClient.Error: *** CommandClient: (http://krios.csb.vanderbilt.edu:45005) Did not receive a JSON response from method “system.describe” with params ()

When I setup cryosparc live the import shows the data are queued (ie cryosparc can find the data to import). The “live worker” jobs start but then almost immediately fail.

When I use cryosparc (non-live) I run the import through the master. It looks like cryosparc live runs the import through the worker GPU machine. So far this is the only difference I see and could help explain why running cryosparc works and cryosparc live does not.

Things I have done to troubleshoot

  1. All machines can ssh between each other without a password
  2. Ports 45000 to 45009 are open on both ends (at the Master and Worker)

What am I missing here?

-Scott

Please can you post the output of

cryosparcm status | grep -v LICENSE

cryosparcm status | grep -v LICENSE

CryoSPARC System master node installed at
/home/emproc/cryosparc3-1/cryosparc_master
Current cryoSPARC version: v4.2.1

CryoSPARC process status:

app RUNNING pid 14751, uptime 3:53:33
app_api RUNNING pid 14770, uptime 3:53:32
app_api_dev STOPPED Not started
app_legacy STOPPED Not started
app_legacy_dev STOPPED Not started
command_core RUNNING pid 14654, uptime 3:53:50
command_rtp RUNNING pid 14691, uptime 3:53:41
command_vis RUNNING pid 14684, uptime 3:53:44
database RUNNING pid 14417, uptime 3:54:06


License is valid

global config variables:
export CRYOSPARC_MASTER_HOSTNAME=“krios.csb.vanderbilt.edu”
export CRYOSPARC_DB_PATH=“/home/emproc/cryosparc3-1/cryosparc_database”
export CRYOSPARC_BASE_PORT=45000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
export CRYOSPARC_CLICK_WRAP=true
export CRYOSPARC_FORCE_HOSTNAME=true

Hi @Scott
Please can you run these commands and report their outputs

  • on krios.csb.vanderbilt.edu
curl 127.0.0.1:45005
curl krios.csb.vanderbilt.edu:45005
  • on the worker that was configured for the Live session where this error occurred
host krios.csb.vanderbilt.edu
curl krios.csb.vanderbilt.edu:45005

From krios (master)
150 krios:/home/emproc% curl 127.0.0.1:45005
Hello World from cryosparc real-time processing manager.
151 krios:/home/emproc% curl krios.csb.vanderbilt.edu:45005
Hello World from cryosparc real-time processing manager.

From GPU (worker)
105 cryogpu4:/home/emproc% host krios.csb.vanderbilt.edu
krios.csb.vanderbilt.edu has address 10.16.201.11
106 cryogpu4:/home/emproc% curl krios.csb.vanderbilt.edu:45005
curl: (7) Failed connect to krios.csb.vanderbilt.edu:45005; No route to host

If you can
ping krios.csb.vanderbilt.edu
from the worker, I would investigate firewall settings on krios.csb.vanderbilt.edu next, based on this.

107 cryogpu4:/home/emproc% ping krios.csb.vanderbilt.edu
PING krios.csb.vanderbilt.edu (10.16.201.11) 56(84) bytes of data.
64 bytes from krios (10.16.201.11): icmp_seq=1 ttl=64 time=0.157 ms
64 bytes from krios (10.16.201.11): icmp_seq=2 ttl=64 time=0.184 ms
64 bytes from krios (10.16.201.11): icmp_seq=3 ttl=64 time=0.705 ms
64 bytes from krios (10.16.201.11): icmp_seq=4 ttl=64 time=0.170 ms
64 bytes from krios (10.16.201.11): icmp_seq=5 ttl=64 time=0.192 ms
64 bytes from krios (10.16.201.11): icmp_seq=6 ttl=64 time=0.199 ms
64 bytes from krios (10.16.201.11): icmp_seq=7 ttl=64 time=0.171 ms
64 bytes from krios (10.16.201.11): icmp_seq=8 ttl=64 time=0.150 ms
64 bytes from krios (10.16.201.11): icmp_seq=9 ttl=64 time=0.140 ms
64 bytes from krios (10.16.201.11): icmp_seq=10 ttl=64 time=0.147 ms
64 bytes from krios (10.16.201.11): icmp_seq=11 ttl=64 time=0.170 ms
^C
— krios.csb.vanderbilt.edu ping statistics —
11 packets transmitted, 11 received, 0% packet loss, time 10003ms
rtt min/avg/max/mdev = 0.140/0.216/0.705/0.156 ms

DO you have a firewall running on the Master process? What does systemctl list-units | grep firewall output?

I reached out the my IT group and they were able to determine there was a firewall issue. port 45005 was “listening” but somehow had become blocked in the firewall. They put in a rule to allow port 45005. I am operational now

Thanks you all or the help!
Scott

2 Likes