Hello,
after upgrading my setup to v4.7.1 (and restarting), I have a problem getting test jobs running on the remote) worker nodes.
My setup consists of:
- farcry (master and worker)
- farcry2 (worker only)
- farcry3 (worker only)
A job-test works just fine on the host farcry (which is both, master and worker), but it fails for the other two remote workers:
(base) cryosparc@farcry:~$ cryosparcm test workers P27 --test gpu
Using project P27
Specifying gpu test
Running worker tests...
2025-11-12 08:50:49,173 log CRITICAL | Worker test results
2025-11-12 08:50:49,173 log CRITICAL | farcry
2025-11-12 08:50:49,173 log CRITICAL | ✓ GPU
2025-11-12 08:50:49,179 log CRITICAL | farcry2
2025-11-12 08:50:49,180 log CRITICAL | ✕ GPU
2025-11-12 08:50:49,180 log CRITICAL | Error:
2025-11-12 08:50:49,180 log CRITICAL | See P27 J33 for more information
2025-11-12 08:50:49,180 log CRITICAL | farcry3
2025-11-12 08:50:49,180 log CRITICAL | ✕ GPU
2025-11-12 08:50:49,180 log CRITICAL | Error:
2025-11-12 08:50:49,180 log CRITICAL | See P27 J32 for more information
The job logs for J33 and J32 read like:
(base) cryosparc@farcry:~$ cryosparcm eventlog P27 J32
[Wed, 12 Nov 2025 08:40:44 GMT] License is valid.
[Wed, 12 Nov 2025 08:40:44 GMT] Launching job on lane default target farcry3 ...
[Wed, 12 Nov 2025 08:40:44 GMT] Running job on remote worker node hostname farcry3
[Wed, 12 Nov 2025 08:50:49 GMT] **** Kill signal sent by unknown user ****
(base) cryosparc@farcry:~$ cryosparcm eventlog P27 J33
[Wed, 12 Nov 2025 08:40:42 GMT] License is valid.
[Wed, 12 Nov 2025 08:40:42 GMT] Launching job on lane default target farcry2 ...
[Wed, 12 Nov 2025 08:40:42 GMT] Running job on remote worker node hostname farcry2
[Wed, 12 Nov 2025 08:50:49 GMT] **** Kill signal sent by unknown user ****
I also tried the “–test ssd” option, with the same result. So I suppose, there is a communication problem towards the remote workers. Name resolution and key-based ssh-login as user “cryosparc” seem to work fine (back and forth to all nodes).
The output of “cryosparcm test i”
✓ Running as cryoSPARC owner
✓ Running on master node
✓ CryoSPARC is running
✓ Connected to command_core at http://farcry:61002
✓ CRYOSPARC_LICENSE_ID environment variable is set
✓ License has correct format
✓ Insecure mode is disabled
✓ License server set to "https://get.cryosparc.com"
✓ Connection to license server succeeded
✓ License server returned success status code 200
✓ License server returned valid JSON response
✓ License exists and is valid
✓ CryoSPARC is running v4.7.1+250814
✓ Running the latest version of CryoSPARC
✓ Patch update not required
✓ Admin user has been created
✓ GPU worker connected.
I already successfully “force-reinstalled” the master and worker dependencies.
The workers show up correctly via “get_scheduler_targets()”:
(base) cryosparc@farcry:~/sparc/cryosparc_worker$ cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/cryocache/', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11714887680, 'name': 'NVIDIA GeForce GTX 1080 Ti'}, {'id': 1, 'mem': 11714887680, 'name': 'NVIDIA GeForce GTX 1080 Ti'}, {'id': 2, 'mem': 11714887680, 'name': 'NVIDIA GeForce GTX 1080 Ti'}], 'hostname': 'farcry', 'lane': 'default', 'monitor_port': None, 'name': 'farcry', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], 'GPU': [0, 1, 2], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'cryosparc@localhost', 'title': 'Worker node farcry', 'type': 'node', 'worker_bin_path': '/home/cryosparc/sparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/cryocache/', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11707809792, 'name': 'NVIDIA GeForce GTX 1080 Ti'}, {'id': 1, 'mem': 11707809792, 'name': 'NVIDIA GeForce GTX 1080 Ti'}, {'id': 2, 'mem': 11707809792, 'name': 'NVIDIA GeForce GTX 1080 Ti'}, {'id': 3, 'mem': 11707809792, 'name': 'NVIDIA GeForce GTX 1080 Ti'}], 'hostname': 'farcry2', 'lane': 'default', 'monitor_port': None, 'name': 'farcry2', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'cryosparc@farcry2', 'title': 'Worker node farcry2', 'type': 'node', 'worker_bin_path': '/home/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/cryocache/', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11707809792, 'name': 'NVIDIA GeForce GTX 1080 Ti'}, {'id': 1, 'mem': 11707809792, 'name': 'NVIDIA GeForce GTX 1080 Ti'}, {'id': 2, 'mem': 11707809792, 'name': 'NVIDIA GeForce GTX 1080 Ti'}, {'id': 3, 'mem': 11707809792, 'name': 'NVIDIA GeForce GTX 1080 Ti'}], 'hostname': 'farcry3', 'lane': 'default', 'monitor_port': None, 'name': 'farcry3', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'cryosparc@farcry3', 'title': 'Worker node farcry3', 'type': 'node', 'worker_bin_path': '/home/cryosparc/cryosparc_worker/bin/cryosparcw'}]
I would greatly appreciate any advice on how to further narrow down the problem.
Thank you very much
Andreas