Connecting worker to master

Hi,
I have a standalone version installed on a workstation. I’d like to install a worker version on another workstation, but am having issues at the connection stage:
./bin/cryosparcw connect --worker Rostam --master atar --port 39000 --ssdpath /mnt/Data_00/cryosparc_cache

./bin/cryosparcw connect --worker $worker_hostname --master $master_hostname --port $port_number --ssdpath $ssd_path

CRYOSPARC CONNECT --------------------------------------------

Attempting to register worker Rostam to command atar:39002
Connecting as unix user cryosparc_user
Will register using ssh string: cryosparc_user@Rostam
If this is incorrect, you should re-run this command with the flag --sshstr

*** client.py: command (http://atar:39002/api) did not reply within timeout of 300 seconds, attempt 1 of 3
*** client.py: command (http://atar:39002/api) did not reply within timeout of 300 seconds, attempt 2 of 3
*** client.py: command (http://atar:39002/api) did not reply within timeout of 300 seconds, attempt 3 of 3
Traceback (most recent call last):
File “bin/connect.py”, line 89, in
cli = client.CommandClient(host=master_hostname, port=command_core_port)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/cryosparc2_compute/client.py”, line 33, in init
self._reload()
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/cryosparc2_compute/client.py”, line 61, in _reload
system = self._get_callable(‘system.describe’)()
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/cryosparc2_compute/client.py”, line 49, in func
r = requests.post(self.url, data = json.dumps(data, cls=NumpyEncoder), headers = header, timeout=self.timeout)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/api.py”, line 116, in post
return request(‘post’, url, data=data, json=json, **kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/api.py”, line 60, in request
return session.request(method=method, url=url, **kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py”, line 533, in request
resp = self.send(prep, **send_kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py”, line 646, in send
r = adapter.send(request, **kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/adapters.py”, line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=‘atar’, port=39002): Max retries exceeded with url: /api (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7fbbfa8afe10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,))

Changing the master or worker in the above command to their respective IP addresses generates the same error. I have SSH Passwordless setup properly between the two machines. Thanks for the help.

Hello @rkhayat,
After:

BlockquoteFailed to establish a new connection: [Errno -3] Temporary failure in name resolution’,))

I understand you have a name resolution issue, not a cryosparc issue. Are you running the command

Blockquote ./bin/cryosparcw connect --worker Rostam --master atar --port 39000 --ssdpath /mnt/Data_00/cryosparc_cache

on “Rostam”? Does “Rostam” have a longer name? (like rostam.domain.gov) Does “Rostam” have a floating IP? Can you do passwordless ssh in a different way?

  1. Rostam does not have a longer name, or I don’t know how to figure out the longer name

  2. Rostam does have a floating IP (e.g. 134.74.27.116). Using the IP for either, in any combination (name, name; name, IP; IP, name; IP,IP) makes no difference

  3. I can ssh with cryosparc_user@134.74.27.116

  4. I used --standalone version when installing cryoSPARC on Master Worker node. Can this be the problem?

Hi @rkhayat

I did not install it with that option, so it could very well be the issue.

Can you reinstall? In principle it’s quite easy: just copy your directory database with another name (like cryosparc2_database_backup), install cryosparc anew without the --sdandalone version, stop it ,perform an rsync -av backup origin, and start it again. Your (current) users and project should be back on the new install after that.

Or simply pass the path to the backup at installation time, like this:

./install.sh --license $LICENSE_ID --hostname atar --dbpath /my/cryosparc2_database_backup --cudapath /my/cuda --port 39000

I hope you manage after this :slightly_smiling_face:

Hi @jucastil

Still no luck. Here is what I have done:

  1. Reinstall cryosparc2_master on gpu workstation 1
    ./install.sh --license 973385d6-f6c4-11e9-a6c3-efe628544f40 --worker_path /home/cryosparc_user/Applications/cryoSPARC_2.14.2b/cryosparc2_worker --cudapath /usr/local/cuda-10.1/ --ssdpath /scr/cryosparc_cache --hostname 134.74.27.116 --initial_email email@emailaddress --initial_password somepassword --initial_name “username”
    For this post, I’ve replaced the username, password … with junk
  2. cryosparcm start
  3. install cryosparc2_worker
    ./install.sh --license 973385d6-f6c4-11e9-a6c3-efe628544f40 --cudapath /usr/local/cuda-10.1/
  4. Connect worker to master
    /bin/cryosparcw connect --worker 134.74.27.116 --master 134.74.27.116 --port 39000 --ssdpath /scr/cryosparc_cache
  5. Restore database
  6. Establish working password free entry with ssh
  7. Install worker on gpu workstation 2
    ./install.sh --license 973385d6-f6c4-11e9-a6c3-efe628544f40 --cudapath /usr/local/cuda-10.1/
  8. Try to connect worker to master
    ./bin/cryosparcw connect --worker 134.74.27.139 --master 134.74.27.116 --port 39000 --ssdpath /mnt/Data_00/cryosparc_cache

Does not work. I get the following error

Attempting to register worker 134.74.27.139 to command 134.74.27.116:39002
Connecting as unix user cryosparc_user
Will register using ssh string: cryosparc_user@134.74.27.139
If this is incorrect, you should re-run this command with the flag --sshstr

*** client.py: command (http://134.74.27.116:39002/api) did not reply within timeout of 300 seconds, attempt 1 of 3
*** client.py: command (http://134.74.27.116:39002/api) did not reply within timeout of 300 seconds, attempt 2 of 3
*** client.py: command (http://134.74.27.116:39002/api) did not reply within timeout of 300 seconds, attempt 3 of 3
Traceback (most recent call last):
File “bin/connect.py”, line 89, in
cli = client.CommandClient(host=master_hostname, port=command_core_port)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/cryosparc2_compute/client.py”, line 33, in init
self._reload()
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/cryosparc2_compute/client.py”, line 61, in _reload
system = self._get_callable(‘system.describe’)()
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/cryosparc2_compute/client.py”, line 49, in func
r = requests.post(self.url, data = json.dumps(data, cls=NumpyEncoder), headers = header, timeout=self.timeout)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/api.py”, line 116, in post
return request(‘post’, url, data=data, json=json, **kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/api.py”, line 60, in request
return session.request(method=method, url=url, **kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py”, line 533, in request
resp = self.send(prep, **send_kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py”, line 646, in send
r = adapter.send(request, **kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/adapters.py”, line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=‘134.74.27.116’, port=39002): Max retries exceeded with url: /api (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7f2a25044ad0>: Failed to establish a new connection: [Errno 113] No route to host’,))

  1. Start SSH tunnel from port 39000 from worker node to master
    ssh -N -f -L localhost:39000:localhost:39000 134.74.27.116
    I can launch jobs via web browser of worker computer
  2. Try to make connection between worker and master, but get this error:

Attempting to register worker 134.74.27.139 to command
134.74.27.116:39002
Connecting as unix user cryosparc_user
Will register using ssh string: cryosparc_user@134.74.27.139
If this is incorrect, you should re-run this command with the flag --sshstr
*** client.py: command (http://134.74.27.116:39002/api) did not reply within timeout of 300 seconds, attempt 1 of 3
*** client.py: command (http://134.74.27.116:39002/api) did not reply within timeout of 300 seconds, attempt 2 of 3
*** client.py: command (http://134.74.27.116:39002/api) did not reply within timeout of 300 seconds, attempt 3 of 3
Traceback (most recent call last):
File “bin/connect.py”, line 89, in
cli = client.CommandClient(host=master_hostname, port=command_core_port)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/cryosparc2_compute/client.py”, line 33, in init
self._reload()
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/cryosparc2_compute/client.py”, line 61, in _reload
system = self._get_callable(‘system.describe’)()
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/cryosparc2_compute/client.py”, line 49, in func
r = requests.post(self.url, data = json.dumps(data, cls=NumpyEncoder), headers = header, timeout=self.timeout)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/api.py”, line 116, in post
return request(‘post’, url, data=data, json=json, **kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/api.py”, line 60, in request
return session.request(method=method, url=url, **kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py”, line 533, in request
resp = self.send(prep, **send_kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py”, line 646, in send
r = adapter.send(request, **kwargs)
File “/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/adapters.py”, line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=‘134.74.27.116’, port=39002): Max retries exceeded with url: /api (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7f5576235ad0>: Failed to establish a new connection: [Errno 113] No route to host’,))

Hello there,

It looks to me like you still have an connection problem.

[Errno 113] No route to host’,

Attempting to register worker 134.74.27.139 to command 134.74.27.116:39002

It could be your ports are not open, or used for something else.

You could check if your firewall is off on both machines (service stop firewalld or systemctl status firewalld depending on your linux color) and that the user cryosparc_user is on 134.74.27.116 and 134.74.27.139, can perform passwordless ssh and is having exactly the same user ids (id cryosparc_user should show the same on both computers).
Also you could try to omit the --port 39000 or change it.

I hope this helps!

Best,

Juan

1 Like

Yes, there was a problem with the ports. The additional GPU are recognized when I swap the worker and mater. Thanks so much for the help Juan.

1 Like