New worker node gets Encounted error from JSONRPC function "system.describe" with params ()

Similar to this thread, I’ve tried all the variations of the hostnames, made sure firewalld is off, made sure I can ssh without a password but I get the same error, any help would be greatly appreciated

./cryosparcw connect --worker sn4622115580 --master cryoem8.ourdomain --port 39000 --nossd 
 ---------------------------------------------------------------
  CRYOSPARC CONNECT --------------------------------------------
 ---------------------------------------------------------------
  Attempting to register worker sn4622115580 to command cryoem8.ourdomain:39002
  Connecting as unix user myuser
  Will register using ssh string: myuser@sn4622115580
  If this is incorrect, you should re-run this command with the flag --sshstr <ssh string> 
 ---------------------------------------------------------------
/home/myuser/cryosparc_worker/cryosparc_tools/cryosparc/command.py:135: UserWarning: *** CommandClient: (http://cryoem8.ourdomain:39002/api) HTTP Error 500 Internal Server Error; please check cryosparcm log command_core for additional information.
Response from server: b'\n<html><head>\n<meta type="copyright" content="Copyright (C) 1996-2017 The Squid Software Foundation and contributors">\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n<title>ERROR: The requested URL could not be retrieved</title>\n<style type="text/css"><!-- \n /*\n * Copyright (C) 1996-2017 The Squid Software Foundation and contributors\n *\n * Squid software is distributed under GPLv2+ license and includes\n * contributions from numerous individuals and organizations.\n * Please see the COPYING and CONTRIBUTORS files for details.\n */\n\n/*\n Stylesheet for Squid Error pages\n Adapted from design by Free CSS Templates\n http://www.freecsstemplates.org\n Released for free under a Creative Commons Attribution 2.5 License\n*/\n\n/* Page basics */\n* {\n\tfont-family: verdana, sans-serif;\n}\n\nhtml body {\n\tmargin: 0;\n\tpadding: 0;\n\tbackground: #efefef;\n\tfont-size: 12px;\n\tcolor: #1e1e1e;\n}\n\n/* Page displayed title area */\n#titles {\n\tmargin-left: 15px;\n\tpadding: 10px;\n\tpadding-left: 100px;\n\tbackground: url(\'/squid-internal-static/icons/SN.png\') no-repeat left;\n}\n\n/* initial title */\n#titles h1 {\n\tcolor: #000000;\n}\n#titles h2 {\n\tcolor: #000000;\n}\n\n/* special event: FTP success page titles */\n#titles ftpsuccess {\n\tbackground-color:#00ff00;\n\twidth:100%;\n}\n\n/* Page displayed body content area */\n#content {\n\tpadding: 10px;\n\tbackground: #ffffff;\n}\n\n/* General text */\np {\n}\n\n/* error brief description */\n#error p {\n}\n\n/* some data which may have caused the problem */\n#data {\n}\n\n/* the error message received from the system or other software */\n#sysmsg {\n}\n\npre {\n    font-family:sans-serif;\n}\n\n/* special event: FTP / Gopher directory listing */\n#dirmsg {\n    font-family: courier;\n    color: black;\n    font-size: 10pt;\n}\n#dirlisting {\n    margin-left: 2%;\n    margin-right: 2%;\n}\n#dirlisting tr.entry td.icon,td.filename,td.size,td.date {\n    border-bottom: groove;\n}\n#dirlisting td.size {\n    width: 50px;\n    text-align: right;\n    padding-right: 5px;\n}\n\n/* horizontal lines */\nhr {\n\tmargin: 0;\n}\n\n/* page displayed footer area */\n#footer {\n\tfont-size: 9px;\n\tpadding-left: 10px;\n}\n\n\nbody\n:lang(fa) { direction: rtl; font-size: 100%; font-family: Tahoma, Roya, sans-serif; float: right; }\n:lang(he) { direction: rtl; }\n --></style>\n</head><body id="ERR_CANNOT_FORWARD">\n<div id="titles">\n<h1>ERROR</h1>\n<h2>The requested URL could not be retrieved</h2>\n</div>\n<hr>\n\n<div id="content">\n<p>The following error was encountered while trying to retrieve the URL: <a href="http://cryoem8.ourdomain:39002/api">http://cryoem8.ourdomain:39002/api</a></p>\n\n<blockquote id="error">\n<p><b>Unable to forward this request at this time.</b></p>\n</blockquote>\n\n<p>This request could not be forwarded to the origin server or to any parent caches.</p>\n\n<p>Some possible problems are:</p>\n<ul>\n<li id="network-down">An Internet connection needed to access this domains origin servers may be down.</li>\n<li id="no-peer">All configured parent caches may be currently unreachable.</li>\n<li id="permission-denied">The administrator may not allow this cache to make direct connections to origin servers.</li>\n</ul>\n\n<p>Your cache administrator is <a href="mailto:admin@localhost?subject=CacheErrorInfo%20-%20ERR_CANNOT_FORWARD&amp;body=CacheHost%3A%20localhost%0D%0AErrPage%3A%20ERR_CANNOT_FORWARD%0D%0AErr%3A%20%5Bnone%5D%0D%0ATimeStamp%3A%20Mon,%2002%20Dec%202024%2021%3A03%3A36%20GMT%0D%0A%0D%0AClientIP%3A%2010.198.24.97%0D%0A%0D%0AHTTP%20Request%3A%0D%0APOST%20%2Fapi%20HTTP%2F1.1%0AAccept-Encoding%3A%20identity%0D%0AContent-Length%3A%20107%0D%0AUser-Agent%3A%20Python-urllib%2F3.10%0D%0AOriginator%3A%20client%0D%0ALicense-Id%3A%2040c42380-913f-11e9-9dde-5fa0d611b478%0D%0AContent-Type%3A%20application%2Fjson%0D%0AConnection%3A%20close%0D%0AHost%3A%20cryoem8.ourdomain%3A39002%0D%0A%0D%0A%0D%0A">admin@localhost</a>.</p>\n\n<br>\n</div>\n\n<hr>\n<div id="footer">\n<p>Generated Mon, 02 Dec 2024 21:03:36 GMT by localhost (squid)</p>\n<!-- ERR_CANNOT_FORWARD -->\n</div>\n</body></html>\n'
  system = self._get_callable("system.describe")()
Traceback (most recent call last):
  File "/home/myuser/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 105, in func
    with make_json_request(self, "/api", data=data, _stacklevel=4) as request:
  File "/home/myuser/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/myuser/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 226, in make_request
    raise CommandError(error_reason, url=url, code=code, data=resdata)
cryosparc_tools.cryosparc.errors.CommandError: *** (http://cryoem8.ourdomain:39002/api, code 500) HTTP Error 500 Internal Server Error; please check cryosparcm log command_core for additional information.
Response from server: b'\n<html><head>\n<meta type="copyright" content="Copyright (C) 1996-2017 The Squid Software Foundation and contributors">\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n<title>ERROR: The requested URL could not be retrieved</title>\n<style type="text/css"><!-- \n /*\n * Copyright (C) 1996-2017 The Squid Software Foundation and contributors\n *\n * Squid software is distributed under GPLv2+ license and includes\n * contributions from numerous individuals and organizations.\n * Please see the COPYING and CONTRIBUTORS files for details.\n */\n\n/*\n Stylesheet for Squid Error pages\n Adapted from design by Free CSS Templates\n http://www.freecsstemplates.org\n Released for free under a Creative Commons Attribution 2.5 License\n*/\n\n/* Page basics */\n* {\n\tfont-family: verdana, sans-serif;\n}\n\nhtml body {\n\tmargin: 0;\n\tpadding: 0;\n\tbackground: #efefef;\n\tfont-size: 12px;\n\tcolor: #1e1e1e;\n}\n\n/* Page displayed title area */\n#titles {\n\tmargin-left: 15px;\n\tpadding: 10px;\n\tpadding-left: 100px;\n\tbackground: url(\'/squid-internal-static/icons/SN.png\') no-repeat left;\n}\n\n/* initial title */\n#titles h1 {\n\tcolor: #000000;\n}\n#titles h2 {\n\tcolor: #000000;\n}\n\n/* special event: FTP success page titles */\n#titles ftpsuccess {\n\tbackground-color:#00ff00;\n\twidth:100%;\n}\n\n/* Page displayed body content area */\n#content {\n\tpadding: 10px;\n\tbackground: #ffffff;\n}\n\n/* General text */\np {\n}\n\n/* error brief description */\n#error p {\n}\n\n/* some data which may have caused the problem */\n#data {\n}\n\n/* the error message received from the system or other software */\n#sysmsg {\n}\n\npre {\n    font-family:sans-serif;\n}\n\n/* special event: FTP / Gopher directory listing */\n#dirmsg {\n    font-family: courier;\n    color: black;\n    font-size: 10pt;\n}\n#dirlisting {\n    margin-left: 2%;\n    margin-right: 2%;\n}\n#dirlisting tr.entry td.icon,td.filename,td.size,td.date {\n    border-bottom: groove;\n}\n#dirlisting td.size {\n    width: 50px;\n    text-align: right;\n    padding-right: 5px;\n}\n\n/* horizontal lines */\nhr {\n\tmargin: 0;\n}\n\n/* page displayed footer area */\n#footer {\n\tfont-size: 9px;\n\tpadding-left: 10px;\n}\n\n\nbody\n:lang(fa) { direction: rtl; font-size: 100%; font-family: Tahoma, Roya, sans-serif; float: right; }\n:lang(he) { direction: rtl; }\n --></style>\n</head><body id="ERR_CANNOT_FORWARD">\n<div id="titles">\n<h1>ERROR</h1>\n<h2>The requested URL could not be retrieved</h2>\n</div>\n<hr>\n\n<div id="content">\n<p>The following error was encountered while trying to retrieve the URL: <a href="http://cryoem8.ourdomain:39002/api">http://cryoem8.ourdomain:39002/api</a></p>\n\n<blockquote id="error">\n<p><b>Unable to forward this request at this time.</b></p>\n</blockquote>\n\n<p>This request could not be forwarded to the origin server or to any parent caches.</p>\n\n<p>Some possible problems are:</p>\n<ul>\n<li id="network-down">An Internet connection needed to access this domains origin servers may be down.</li>\n<li id="no-peer">All configured parent caches may be currently unreachable.</li>\n<li id="permission-denied">The administrator may not allow this cache to make direct connections to origin servers.</li>\n</ul>\n\n<p>Your cache administrator is <a href="mailto:admin@localhost?subject=CacheErrorInfo%20-%20ERR_CANNOT_FORWARD&amp;body=CacheHost%3A%20localhost%0D%0AErrPage%3A%20ERR_CANNOT_FORWARD%0D%0AErr%3A%20%5Bnone%5D%0D%0ATimeStamp%3A%20Mon,%2002%20Dec%202024%2021%3A03%3A36%20GMT%0D%0A%0D%0AClientIP%3A%2010.198.24.97%0D%0A%0D%0AHTTP%20Request%3A%0D%0APOST%20%2Fapi%20HTTP%2F1.1%0AAccept-Encoding%3A%20identity%0D%0AContent-Length%3A%20107%0D%0AUser-Agent%3A%20Python-urllib%2F3.10%0D%0AOriginator%3A%20client%0D%0ALicense-Id%3A%2040c42380-913f-11e9-9dde-5fa0d611b478%0D%0AContent-Type%3A%20application%2Fjson%0D%0AConnection%3A%20close%0D%0AHost%3A%20cryoem8.ourdomain%3A39002%0D%0A%0D%0A%0D%0A">admin@localhost</a>.</p>\n\n<br>\n</div>\n\n<hr>\n<div id="footer">\n<p>Generated Mon, 02 Dec 2024 21:03:36 GMT by localhost (squid)</p>\n<!-- ERR_CANNOT_FORWARD -->\n</div>\n</body></html>\n'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/myuser/cryosparc_worker/bin/connect.py", line 78, in <module>
    cli = client.CommandClient(host=master_hostname, port=command_core_port, service="command_core")
  File "/home/myuser/cryosparc_worker/cryosparc_compute/client.py", line 38, in __init__
    super().__init__(service, host, port, url, timeout, headers, cls=NumpyEncoder)
  File "/home/myuser/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 97, in __init__
    self._reload()  # attempt connection immediately to gather methods
  File "/home/myuser/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 135, in _reload
    system = self._get_callable("system.describe")()
  File "/home/myuser/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 108, in func
    raise CommandError(
cryosparc_tools.cryosparc.errors.CommandError: *** (http://cryoem8.ourdomain:39002, code 500) Encounted error from JSONRPC function "system.describe" with params ()

Edit: when I change to using the IP address for the --master option it connects. Why would that be?

./cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/scratch/cryosparc-cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 4, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 5, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 6, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 7, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'cryoem8.ourdomain.edu', 'lane': 'default', 'monitor_port': None, 'name': 'cryoem8.ourdomain.edu', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95], 'GPU': [0, 1, 2, 3, 4, 5, 6, 7], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192]}, 'ssh_str': 'myuser@cryoem8.ourdomain.edu', 'title': 'Worker node cryoem8.ourdomain.edu', 'type': 'node', 'worker_bin_path': '/home/myuser/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/scratch/cryosparc-cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 4, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 5, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 6, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 7, 'mem': 11539054592, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'cryoem7', 'lane': 'cryoem7', 'monitor_port': None, 'name': 'cryoem7', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95], 'GPU': [0, 1, 2, 3, 4, 5, 6, 7], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192]}, 'ssh_str': 'myuser@cryoem7.ourdomain.edu', 'title': 'Worker node cryoem7', 'type': 'node', 'worker_bin_path': '/home/myuser/cryosparc_worker/bin/cryosparcw'}, {'cache_path': None, 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11538923520, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'spgpu2', 'lane': 'box', 'monitor_port': None, 'name': 'spgpu2', 'resource_fixed': {'SSD': False}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], 'GPU': [0], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'myuser@spgpu2.ourdomain.edu', 'title': 'Worker node spgpu2', 'type': 'node', 'worker_bin_path': '/home/myuser/cryosparc_worker/bin/cryosparcw'}, {'cache_path': None, 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11538923520, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'spgpu3', 'lane': 'box', 'monitor_port': None, 'name': 'spgpu3', 'resource_fixed': {'SSD': False}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], 'GPU': [0], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'myuser@spgpu3.ourdomain.edu', 'title': 'Worker node spgpu3', 'type': 'node', 'worker_bin_path': '/home/myuser/cryosparc_worker/bin/cryosparcw'}, {'cache_path': None, 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11538923520, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'spgpu4', 'lane': 'box', 'monitor_port': None, 'name': 'spgpu4', 'resource_fixed': {'SSD': False}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], 'GPU': [0], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'myuser@spgpu4.ourdomain.edu', 'title': 'Worker node spgpu4', 'type': 'node', 'worker_bin_path': '/home/myuser/cryosparc_worker/bin/cryosparcw'}, {'cache_path': None, 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 3149856768, 'name': 'NVIDIA GeForce GTX 1060 3GB'}], 'hostname': 'exxgpu1', 'lane': 'slow', 'monitor_port': None, 'name': 'exxgpu1', 'resource_fixed': {'SSD': False}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 'GPU': [0], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7]}, 'ssh_str': 'myuser@exxgpu1.ourdomain.edu', 'title': 'Worker node exxgpu1', 'type': 'node', 'worker_bin_path': '/home/myuser/cryosparc_worker/bin/cryosparcw'}, {'cache_path': None, 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 3149856768, 'name': 'NVIDIA GeForce GTX 1060 3GB'}], 'hostname': 'spgpu1', 'lane': 'box', 'monitor_port': None, 'name': 'spgpu1', 'resource_fixed': {'SSD': False}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], 'GPU': [0], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'myuser@spgpu1.ourdomain.edu', 'title': 'Worker node spgpu1', 'type': 'node', 'worker_bin_path': '/scratch/software/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': None, 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 51041271808, 'name': 'NVIDIA RTX A6000'}, {'id': 1, 'mem': 51041271808, 'name': 'NVIDIA RTX A6000'}, {'id': 2, 'mem': 51041271808, 'name': 'NVIDIA RTX A6000'}, {'id': 3, 'mem': 51041271808, 'name': 'NVIDIA RTX A6000'}, {'id': 4, 'mem': 51041271808, 'name': 'NVIDIA RTX A6000'}, {'id': 5, 'mem': 51041271808, 'name': 'NVIDIA RTX A6000'}, {'id': 6, 'mem': 51041271808, 'name': 'NVIDIA RTX A6000'}, {'id': 7, 'mem': 51041271808, 'name': 'NVIDIA RTX A6000'}], 'hostname': 'cryoem9.ourdomain.edu', 'lane': 'cryoem9', 'monitor_port': None, 'name': 'cryoem9.ourdomain.edu', 'resource_fixed': {'SSD': False}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71], 'GPU': [0, 1, 2, 3, 4, 5, 6, 7], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257]}, 'ssh_str': 'exx@cryoem9.ourdomain.edu', 'title': 'Worker node cryoem9.ourdomain.edu', 'type': 'node', 'worker_bin_path': '/home/myuser/cryosparc_worker/bin/cryosparcw'}]

cryoem9 is the new worker. The job status stuck on “Running job on remote worker hostname cryoem9

"-12-02 16:51:46,560 scheduler_run_core   INFO     | Now trying to schedule J2056
2024-12-02 16:51:46,561 scheduler_run_job    INFO     |    Scheduling job to cryoem9
2024-12-02 16:51:47,612 scheduler_run_job    INFO     | Not a commercial instance - heartbeat set to 12 hours.
2024-12-02 16:51:47,935 scheduler_run_job    INFO     |      Launchable! -- Launching.
2024-12-02 16:51:47,943 set_job_status       INFO     | Status changed for P1.J2056 from queued to launched
2024-12-02 16:51:47,944 app_stats_refresh    INFO     | Calling app stats refresh url http://cryoem8.ourdomain:39000/api/actions/stats/refresh_job for project_uid P1, workspace_uid None, job_uid J2056 with body {'projectUid': 'P1', 'jobUid': 'J2056'}
2024-12-02 16:51:47,949 app_stats_refresh    INFO     | code 200, text {"success":true}
2024-12-02 16:51:47,956 run_job              INFO     |       Running P1 J2056
2024-12-02 16:51:47,956 run_job              INFO     |         Running job using: /home/myuser/cryosparc_worker/bin/cryosparcw
2024-12-02 16:51:47,956 run_job              INFO     |         Running job on remote worker node hostname cryoem9
2024-12-02 16:51:47,957 run_job              INFO     |         cmd: bash -c "nohup /home/myuser/cryosparc_worker/bin/cryosparcw run --project P1 --job J2056 --master_hostname cryoem8.ourdomain --master_command_core_port 39002 > /home/workstation/Zuker/CS-zuker/J2056/job.log 2>&1 & "
2024-12-02 16:51:48,529 run_job              INFO     | 
2024-12-02 16:51:48,529 scheduler_run_core   INFO     | Finished

The job.log has the same errors:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "cryosparc_master/cryosparc_compute/run.py", line 255, in cryosparc_master.cryosparc_compute.run.run
  File "cryosparc_master/cryosparc_compute/run.py", line 50, in cryosparc_master.cryosparc_compute.run.main
  File "/home/ouruser/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 131, in connect
    cli = client.CommandClient(master_hostname, int(master_command_core_port), service="command_core")
  File "/home/ouruser/cryosparc_worker/cryosparc_compute/client.py", line 38, in __init__
    super().__init__(service, host, port, url, timeout, headers, cls=NumpyEncoder)
  File "/home/ouruser/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 97, in __init__
    self._reload()  # attempt connection immediately to gather methods
  File "/home/ouruser/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 135, in _reload
    system = self._get_callable("system.describe")()
  File "/home/ouruser/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 108, in func
    raise CommandError(
cryosparc_tools.cryosparc.errors.CommandError: *** (http://cryoem8.ourdomain.edu:39002, code 500) Encounted error from JSONRPC function "system.describe" with params ()

It is possible that, for some reason, the newly (to-be-) added worker node cannot resolve the cryoem8.oudomain hostname. Your network admins be able to help ensure, via configuration of the DHCP and/or DNS server and the CryoSPARC nodes, CryoSPARC nodes are assigned stable and resolvable fully qualified domain names.
On second thought, your cryosparcw connect command’s --master may have included a typo (missing r in our and or missing top-level domain edu):

Potential reasons:

  • the ssh connection to the worker node failed. Ensure that you can connect from the CryoSPARC master host to the relevant worker using the value of the relevant worker’s "ssh_str": field (see get_scheduler_targets() output) without being prompted for password or host key confirmation.
    s
  • the cryosparcw script was not found on the worker (see value of worker’s "worker_bin_path": key in get_scheduler_targets() output)
  • the project directory has not been shared with the worker, or mounted at a different path than on the master, or with insufficient permissions
  • there is a mismatch between the numeric user ids for the exx or whatever user that is supposed to own CryoSPARC processes (“common unix user account”)

The FQDN resolves that’s why I wonder what is preventing the FQDN from working. Both servers /etc/hosts have the IP and FQDN

Typo on my part I fixed my post, poor obfuscation.

I can confirm the user on master (cryoem8) can ssh exx@cryoem9 without being prompted.

All good:

'**worker_bin_path**': '/home/exx/cryosparc_worker/bin/cryosparcw'}]

ls /home/exx/cryosparc_worker/bin/cryosparcw
/home/exx/cryosparc_worker/bin/cryosparcw

This has to be the issue. The master, cryoem8, is running under a different account that does not exist on worker cryoem9. However as indicated above that user can still ssh without password. I take it that does not matter?

I also do not know. Are you saying that sn4622115580 (which, by the way, looks more like a manufacturer-assigned than “stable”, resolvable hostname) has an entry inside /etc/hosts:
W.X.Y.Z cryoem8.ourdomain.edu, where W.X.Y.Z is the same IP address that, when used for the
cryosparcw connect --master parameter, allowed successful connection?

That other user may not have write to and/or otherwise access the shared project directory.

Absolutely correct. That sn is serial number, it’s how Exxact does it.

RW access confirmed:
touch /path/to/J2059/testfile

Looking back at your error

this looks like interception of the request by a http proxy on your network.

  1. What is the output of the command (as myuser on sn4622115580):

    env | grep -i -e proxy -e http -e request
    

    ?

  2. Is sn4622115580 the same host as cryoem9?

  3. Did you test this by running on the CryoSPARC master host (as myuser, or whoever owns CryoSPARC processes on the CryoSPARC master), replacing P99 with the actual id of the project to which J2059 belongs:

    ssh exx@cryoem9 "ls -ld $(cryosparcm cli "get_project_dir_abs('P99')") && uname -a"
    

Ah good catch:

env | grep -i -e proxy -e http -e request

SELINUX_ROLE_**REQUEST**ED=

**http**_**proxy**=**http**://gw-srv-01.ourdomain.edu:3128/

SELINUX_LEVEL_**REQUEST**ED=

Yes.

://cryoem8.ourdomain.edu:39002, code 400) Encountered ServerError from JSONRPC function "get_project_dir_abs" with params ('J2056',):
ServerError: Error retrieving project dir for J2056 - project not found
Traceback (most recent call last):
  File "/home/exx/cryosparc_master/cryosparc_command/commandcommon.py", line 196, in wrapper
    res = func(*args, **kwargs)
  File "/home/exx/cryosparc_master/cryosparc_command/command_core/__init__.py", line 8149, in get_project_dir_abs
    assert project_doc, f"Error retrieving project dir for {project_uid} - project not found"
AssertionError: Error retrieving project dir for J2056 - project not found

drwx------. 32 exx exx 4096 Dec  2 14:53 .
Linux sn4622115580 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

But with the other user:

 ssh otheruser@cryoem9 "ls -ld $(/home/otheruser/cryosparc_master/bin/cryosparcm cli "get_project_dir_abs('J2056')") && uname -a"
*** (http://cryoem8.fitzpatrick.zi.columbia.edu:39002, code 400) Encountered ServerError from JSONRPC function "get_project_dir_abs" with params ('J2056',):
ServerError: Error retrieving project dir for J2056 - project not found
Traceback (most recent call last):
  File "/home/otheruser/cryosparc_master/cryosparc_command/commandcommon.py", line 196, in wrapper
    res = func(*args, **kwargs)
  File "/home/otheruser/cryosparc_master/cryosparc_command/command_core/__init__.py", line 8149, in get_project_dir_abs
    assert project_doc, f"Error retrieving project dir for {project_uid} - project not found"
AssertionError: Error retrieving project dir for J2056 - project not found

drwx------. 9 otheruser 1002 4096 Dec  3 10:15 .
Linux sn4622115580 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

A simple version works:

ssh otheruser@cryoem9 "ls -ld /path/to/J2056"
drwxrwx---. 3 otheruser otheruser 109 Dec  3 16:50 /path/to/J2056

Edit I see this thread, which mentions setting ``export NO_PROXY="127.0.0.1, localhost,sn4622115580" might work around this. Does this need to happen on cryoem8, the master server? Or just the worker?

get_project_dir_abs() requires a project id (starting with P) instead of a job id. Please can you try again?
You may want setup want to run your multi-host CryoSPARC instance under a consistent numeric userid to avoid:

  • unnecessarily generous permissions on project directories
  • additional problems that inconsistent file ownerships may cause down the road, such as during the management of backups, archives and data migrations.

This may or may not help in your case, depending on the configuration of the network and/or computer. It seems you observed a proxy-related error when running a command on the worker.

Yes the project ID is J2056

 ssh exx@cryoem9 "ls -ld $(./cryosparcm cli "get_project_dir_abs('J2056')") && uname -a"
*** (http://cryoem8.ourdomain.edu:39002, code 400) Encountered ServerError from JSONRPC function "get_project_dir_abs" with params ('J2056',):
ServerError: Error retrieving project dir for J2056 - project not found
Traceback (most recent call last):
  File "/home/ouruser/cryosparc_master/cryosparc_command/commandcommon.py", line 196, in wrapper
    res = func(*args, **kwargs)
  File "/home/ouruser/cryosparc_master/cryosparc_command/command_core/__init__.py", line 8149, in get_project_dir_abs
    assert project_doc, f"Error retrieving project dir for {project_uid} - project not found"
AssertionError: Error retrieving project dir for J2056 - project not found

drwx------. 32 exx exx 4096 Dec  2 14:53 .
Linux sn4622115580 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
[ouruser@cryoem8 bin]$ ssh ouruser@cryoem9 "ls -ld $(./cryosparcm cli "get_project_dir_abs('J2056')") && uname -a"
*** (http://cryoem8.ourdomain.edu:39002, code 400) Encountered ServerError from JSONRPC function "get_project_dir_abs" with params ('J2056',):
ServerError: Error retrieving project dir for J2056 - project not found
Traceback (most recent call last):
  File "/home/ouruser/cryosparc_master/cryosparc_command/commandcommon.py", line 196, in wrapper
    res = func(*args, **kwargs)
  File "/home/ouruser/cryosparc_master/cryosparc_command/command_core/__init__.py", line 8149, in get_project_dir_abs
    assert project_doc, f"Error retrieving project dir for {project_uid} - project not found"
AssertionError: Error retrieving project dir for J2056 - project not found

drwx------. 9 ouruser 1002 4096 Dec  3 10:15 .
Linux sn4622115580 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

But just to show you both users can read/write to J2056

cryoem8 ~]$ ssh cryoem9 ls -l  /engram/workstation/Zuker/CS-zuker/J2056 
total 176
-rwxrwx---. 1 ouruser ouruser    18 Dec  3 19:20 events.bson
drwxrwx---. 2 ouruser ouruser     0 Dec  3 19:20 gridfs_data
-rwxrwx---. 1 ouruser ouruser 23357 Dec  3 19:20 job.json
-rwxrwx---. 1 ouruser ouruser 22008 Dec  3 19:20 job.log

I’m basing this on a colleague’s post albeit there’s was standalone.

The project ID for this job can be displayed with the command

grep \"project_uid\" /engram/workstation/Zuker/CS-zuker/J2056/job.json

Interesting. Do you have the same output as your colleague for the command (on cryoem8)

cryosparcm call env |grep -i proxy

?
What about the commands

/path/to/cryosparc_worker/bin/cryosparcw call env | grep -i proxy

on each of your workers?

grep \"project_uid\" /engram/workstation/Zuker/CS-zuker/J2056/job.json

**"project_uid"**: "P1",

OK now I get:

ssh ouruni@cryoem9 "ls -ld $(./cryosparcm cli "get_project_dir_abs('P1')") && uname -a"
drwxrwx---. 2254 awf2130 1002 59767 Dec  4 22:40 /engram/workstation/Zuker/CS-zuker
Linux sn4622115580 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
./cryosparcm call env |grep -I proxy
NO_PROXY=ouruni.edu
http_proxy=http://gw-srv-01.ouruni.edu:3128
https_proxy=http://gw-srv-01.ouruni.edu:3128
HTTPS_PROXY=http://gw-srv-01.ouruni.edu:3128
no_proxy=localhost,::1,127.0.0.1,cryoem8.ouruni.edu,.ouruni.edu
HTTP_PROXY=http://gw-srv-01.rc.ouruni.edu:3128
./cryosparcw call env | grep -I proxy
http_proxy=http://gw-srv-01.ouruni.edu:3128/
no_proxy=localhost,::1,127.0.0.1,ouruni.edu

Interesting. What about (using the same ssh string, but different command):

ssh ouruni@cryoem9 "id && uname -a"

If the cryosparcw connect --master parameter ended in ouruni.edu, I would have expected the proxy to be bypassed, but I may misunderstand the effect of the no_proxy variable. You might want to

  1. ask your IT support for suggestions
  2. quoting the definition
export no_proxy="localhost,::1,127.0.0.1,ouruni.edu"
  1. including the IP address of the master in the no_proxy definition
uid=xxx(ouruni) gid=500(ouruni) groups=500(ouruni) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Linux sn4622115580 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

So something like:
export no_proxy=localhost,::1,127.0.0.1,${CRYOSPARC_MASTER_HOSTNAME},.ouruni.edu

In both config.sh on master and worker?

FWIW I was able to do a cryosparcw using the FQDN after setting no_proxy.

I cannot be sure due to the obfuscation, but unless ouruni is awf2130 or member of the 1002 group, ouruni cannot access the project directory on cryoem9, and jobs would fail to run on cryoem9.

Inclusion of ${CRYOSPARC_MASTER_HOSTNAME} in the no_proxy definition would be effective only if CRYOSPARC_MASTER_HOSTNAME were also defined, but CRYOSPARC_MASTER_HOSTNAME might not be defined in the worker environment. In any case, because

I recommend no additional changes to the no_proxy definition.

Yes awf2130 = ouruni. What showed that the user would not be able to run on cryoem9? I can add that to the 1002 group. Here is the actual user for full context:

[awf2130@cryoem8 cryosparc_master]$ id
uid=485959(awf2130) gid=500(user) groups=500(user),46004(habazi)
[awf2130@cryoem8 cryosparc_master]$ ssh awf2130@cryoem9 "id && uname -a"
uid=485959(awf2130) gid=500(awf2130) groups=500(awf2130) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Linux sn4622115580 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

I was missing confirmation that the fictional ouruni user owns the project directory, which you just provided

  1. Is the directory
    /home/workstation/Zuker/CS-zuker/J2056 also owned by awf2130?
  2. Please can you post the output of the command
ls -al /home/workstation/Zuker/CS-zuker/J2056/
  1. In get_scheduler_targets() output, is the ssh_str now awf2130@cryoem9.ourdomain.edu for the cryoem9 worker? It is above shown as exx@cryoem9.ourdomain.edu.

It’s not /home it’s /engram and yes owned by awf2130:

ls -al  /engram/workstation/Zuker/CS-zuker/J2056
total 752
drwxrwx---    3 awf2130 user   109 Dec  3 19:20 .
drwxrwx--- 2281 awf2130 exx  60322 Dec  5 21:02 ..
-rwxrwx---    1 awf2130 user    18 Dec  3 19:20 events.bson
drwxrwx---    2 awf2130 user     0 Dec  3 19:20 gridfs_data
-rwxrwx---    1 awf2130 user 23357 Dec  3 19:20 job.json
-rwxrwx---    1 awf2130 user 22008 Dec  3 19:20 job.log

Yes I was going to use the exx user but all the installations on the other workers and master were by awf2130. That did not exist in cryoem9, so I created it, used the same UID/GID and installed the worker there under awf2130

Thanks @RobK. Please can you describe what is currently not (yet) working as expected?

Well, I thought having a worker be version 4.6.2 and master being 4.6.0 was causing this issue with jobs not running but the issue persists. What other debug can I provide?

The job/log shows:

Unable to forward this request at this time. This request could not be forwarded to the origin server or to any parent caches. Some possible problems are: Internet connection needed to access this domains origin servers may be down.  All configured parent caches may be currently unreachable.  The administrator may not allow this cache to make direct connections to origin servers.

cryosparc_tools.cryosparc.errors.CommandError: *** (http://cryoem8.ouruni.edu:39002, code 500) Encounted error from JSONRPC function "system.describe" with params ()

Could that job have been started on on a worker node that did not have necessary no_proxy setting? On that particular worker node, what is the output of the commands

uname -a
/path/to/cryosparc_worker/bin/cryosparcw call curl http://cryoem8.ouruni.edu:39002
/path/to/cryosparc_worker/bin/cryosparcw env | grep -i proxy