Failed to get GPU info returned non-zero exit status 255

Similiar to this other thread we see the below error:

2024-09-26 13:17:58,266 get_gpu_info_run     ERROR    | Failed to get GPU info on ourcomaind.edu
2024-09-26 13:17:58,266 get_gpu_info_run     ERROR    | Traceback (most recent call last):
2024-09-26 13:17:58,266 get_gpu_info_run     ERROR    |   File "/home/me/cryosparc_master/cryosparc_command/command_core/__init__.py", line 1516, in get_gpu_info_run
2024-09-26 13:17:58,266 get_gpu_info_run     ERROR    |     value = subprocess.check_output(full_command, stderr=subprocess.STDOUT, shell=shell, timeout=JOB_LAUNCH_TIMEOUT_SECONDS).decode()
2024-09-26 13:17:58,266 get_gpu_info_run     ERROR    |   File "/home/me/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/subprocess.py", line 421, in check_output
2024-09-26 13:17:58,266 get_gpu_info_run     ERROR    |     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2024-09-26 13:17:58,266 get_gpu_info_run     ERROR    |   File "/home/me/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/subprocess.py", line 526, in run
2024-09-26 13:17:58,266 get_gpu_info_run     ERROR    |     raise CalledProcessError(retcode, process.args,
2024-09-26 13:17:58,266 get_gpu_info_run     ERROR    | subprocess.CalledProcessError: Command '['ssh', 'me@ourcomaind.edu', 'bash -c "eval $(/opt/cryosparc_worker/bin/cryosparcw env); python /opt/cryosparc_worker/cryosparc_compute/get_gpu_info.py"']' returned non-zero exit status 255.
2024-09-26 13:18:06,489 update_all_job_sizes_run INFO     | Finished updating all job sizes (0 jobs updated, 0 projects updated)
2024-09-26 13:18:11,024 wrapper              ERROR    | JSONRPC ERROR at set_user_viewed_workspace
2024-09-26 13:18:11,024 wrapper              ERROR    | Traceback (most recent call last):
2024-09-26 13:18:11,024 wrapper              ERROR    |   File "/home/me/cryosparc_master/cryosparc_command/commandcommon.py", line 196, in wrapper
2024-09-26 13:18:11,024 wrapper              ERROR    |     res = func(*args, **kwargs)
2024-09-26 13:18:11,024 wrapper              ERROR    |   File "/home/me/cryosparc_master/cryosparc_command/command_core/__init__.py", line 1230, in set_user_viewed_workspace
2024-09-26 13:18:11,024 wrapper              ERROR    |     update_workspace(project_uid, workspace_uid, {'last_accessed' : {'n:ame' : get_username_by_id(user_id), 'accessed_at' : datetime.datetime.utcnow()}}, operation='$set', export=False)
2024-09-26 13:18:11,024 wrapper              ERROR    |   File "/home/me/cryosparc_master/cryosparc_command/commandcommon.py", line 187, in wrapper
2024-09-26 13:18:11,024 wrapper              ERROR    |     return func(*args, **kwargs)
2024-09-26 13:18:11,024 wrapper              ERROR    |   File "/home/me/cryosparc_master/cryosparc_command/commandcommon.py", line 250, in wrapper
2024-09-26 13:18:11,024 wrapper              ERROR    |     assert os.path.isfile(
2024-09-26 13:18:11,024 wrapper              ERROR    | AssertionError: validation error: lock file for P1 at /data/cryosparc/CS-andy-aug/cs.lock absent or otherwise inaccessible. 
2024-09-26 13:18:11,714 wrapper              ERROR    | JSONRPC ERROR at set_user_viewed_job
2024-09-26 13:18:11,714 wrapper              ERROR    | Traceback (most recent call last):
2024-09-26 13:18:11,714 wrapper              ERROR    |   File "/home/me/cryosparc_master/cryosparc_command/commandcommon.py", line 196, in wrapper
2024-09-26 13:18:11,714 wrapper              ERROR    |     res = func(*args, **kwargs)
2024-09-26 13:18:11,714 wrapper              ERROR    |   File "/home/me/cryosparc_master/cryosparc_command/command_core/__init__.py", line 1287, in set_user_viewed_job
2024-09-26 13:18:11,714 wrapper              ERROR    |     update_job(project_uid, job_uid, {'last_accessed' : {'name' : get_username_by_id(user_id), 'accessed_at' : datetime.datetime.utcnow()}})
2024-09-26 13:18:11,714 wrapper              ERROR    |   File "/home/me/cryosparc_master/cryosparc_command/commandcommon.py", line 187, in wrapper
2024-09-26 13:18:11,714 wrapper              ERROR    |     return func(*args, **kwargs)

Here are the cmd results requested in the other thread:

 cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/opt/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 25425608704, 'name': 'NVIDIA RTX A5000'}, {'id': 1, 'mem': 25425608704, 'name': 'NVIDIA RTX A5000'}, {'id': 2, 'mem': 25417023488, 'name': 'NVIDIA RTX A5000'}, {'id': 3, 'mem': 25425608704, 'name': 'NVIDIA RTX A5000'}], 'hostname': 'sn4622115977', 'lane': 'default', 'monitor_port': None, 'name': 'sn4622115977', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'me@sn4622115977', 'title': 'Worker node sn4622115977', 'type': 'node', 'worker_bin_path': '/opt/cryosparc_worker/bin/cryosparcw'}, {'cache_path': None, 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 25425608704, 'name': 'NVIDIA RTX A5000'}, {'id': 1, 'mem': 25425608704, 'name': 'NVIDIA RTX A5000'}, {'id': 2, 'mem': 25417023488, 'name': 'NVIDIA RTX A5000'}, {'id': 3, 'mem': 25425608704, 'name': 'NVIDIA RTX A5000'}], 'hostname': 'ourdomain.edu', 'lane': 'default', 'monitor_port': None, 'name': 'ourdomain.edu', 'resource_fixed': {'SSD': False}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'me@ourdomain.edu', 'title': 'Worker node ourdomain.edu', 'type': 'node', 'worker_bin_path': '/opt/cryosparc_worker/bin/cryosparcw'}]
(base) me@sn4622115977:~$ cryosparcm status | grep HOSTNAME
export CRYOSPARC_MASTER_HOSTNAME="sn4622115977"
(base) me@sn4622115977:~$ cryosparcm status
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/home/me/cryosparc_master
Current cryoSPARC version: v4.5.3
----------------------------------------------------------------------------

CryoSPARC process status:

app                              RUNNING   pid 5513, uptime 2:21:42
app_api                          RUNNING   pid 5533, uptime 2:21:41
app_api_dev                      STOPPED   Not started
command_core                     RUNNING   pid 5425, uptime 2:21:55
command_rtp                      RUNNING   pid 5489, uptime 2:21:48
command_vis                      RUNNING   pid 5460, uptime 2:21:49
database                         RUNNING   pid 5318, uptime 2:21:59

----------------------------------------------------------------------------
License is valid
----------------------------------------------------------------------------

global config variables:
export CRYOSPARC_LICENSE_ID="bae9edd6-54dd-11ef-93a3-7b0d1eadc7e2"
export CRYOSPARC_MASTER_HOSTNAME="sn4622115977"
export CRYOSPARC_DB_PATH="/home/me/cryosparc_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000
export CRYOSPARC_INSECURE=false
export CRYOSPARC_DB_ENABLE_AUTH=true
export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10
export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000
export CRYOSPARC_PROJECT_DIR_PREFIX='CS-'
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_CLICK_WRAP=true
export CRYOSPARC_SSD_CACHE_LIFETIME_DAYS=10

What are the outputs of these commands on host sn4622115977:

connstr="me@ourdomain.edu" # define with actual username and remote worker hostname
hostname 
ssh "$connstr" nvidia-smi -L
ssh "$connstr" ls -al /data/cryosparc/CS-andy-aug/
ssh "$connstr" ls -l /opt/cryosparc_worker/bin/cryosparcw
ssh curl sn4622115977:39002

(base) exx@sn4622115977:~$ connstr="exx@sn4622115977"

ssh "$connstr" nvidia-smi -L
GPU 0: NVIDIA RTX A5000 (UUID: GPU-34ee3eac-c2ba-393b-4eb3-87cc9ba29410)
GPU 1: NVIDIA RTX A5000 (UUID: GPU-a9aedcfc-e888-1706-0734-597f4427473b)
GPU 2: NVIDIA RTX A5000 (UUID: GPU-8c02f879-b591-4368-394f-feecc8ab4845)
GPU 3: NVIDIA RTX A5000 (UUID: GPU-f05f9c00-42b5-2cd0-7979-e1be474ff113)
 ssh "$connstr" ls -al /data/cryosparc/CS-andy-aug/
drwxrwxr-x 37 exx exx 4096 Sep 26 13:57 .
drwxrwxr-x  3 exx exx 4096 Sep 26 13:50 ..
-rw-rw-r--  1 exx exx   83 Sep 26 13:52 cs.lock
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J127
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J144
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J145
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J146
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J147
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J149
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J153
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J164
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J166
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J172
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J174
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J182
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J279
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J287
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J292
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J61
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J62
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J63
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J69
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J70
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J71
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J72
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J74
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J76
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J77
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J81
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J84
drwxrwxr-x  3 exx exx 4096 Sep 26 13:57 J86
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J89
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J90
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J92
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J96
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J98
drwxrwxr-x  3 exx exx 4096 Sep 26 13:56 J99
-rw-rw-r--  1 exx exx 4586 Sep 26 13:57 job_manifest.json
drwxrwxr-x  2 exx exx 4096 Sep 26 13:56 S2
-rw-rw-r--  1 exx exx  437 Sep 26 14:03 workspaces.json
sh "$connstr" ls -l /opt/cryosparc_worker/bin/cryosparcw

-rwxr-xr-x 1 exx exx 14475 Jun  4 20:19 /opt/cryosparc_worker/bin/cryosparcw
(base) exx@sn4622115977:~$ ssh curl sn4622115977:39002
ssh: Could not resolve hostname curl: Temporary failure in name resolution

Did jobs ever run on the separate worker under the current configuration?
The hostname defined for CRYOSPARC_MASTER_HOSTNAME inside cryosparc_master/config.sh must be resolvable by all CryoSPARC worker nodes, to allow communication from the worker to the master.