Failed to launch! 255 new

lmthomas · September 15, 2025, 4:43pm

So I know this has popped up before and I tried changing the host to localhost since we have cryosparc running on a single machine. The thing is cryosparc has been running fine until this popped up for a user. It also only appears to occur in just one workspace, I have tried setting up jobs in other workspaces and they do run. Seems a bit hit or miss.

When I start cryosparc I do get the following so I don’t know if things may not be reading right.

CryoSPARC master started.
From this machine, access CryoSPARC and CryoSPARC Live at
http://localhost:39000

From other machines on the network, access CryoSPARC and CryoSPARC Live at
http://spgpu:39000

Len

wtempel · September 15, 2025, 5:29pm

Hi Len,

Please can you post the outputs of these commands:

hostname -f
host $(hostname -f)
cryosparcm cli "get_scheduler_targets()"
cryosparcm status | grep -e HOSTNAME -e BASEPORT
ps -eo user,pid,ppid,cmd | grep _master

W.

lmthomas · September 15, 2025, 5:48pm

Hello,

Here are the outputs from the commands.

[spuser@spgpu ~]$ hostname -f
spgpu
[spuser@spgpu ~]$ host $(hostname -f)
Host spgpu not found: 3(NXDOMAIN)
[spuser@spgpu ~]$ cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/ssd/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 25383469056, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 1, 'mem': 25386352640, 'name': 'NVIDIA GeForce RTX 4090'}], 'hostname': 'spgpu', 'lane': 'default', 'monitor_port': None, 'name': 'spgpu', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], 'GPU': [0, 1], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'spuser@spgpu', 'title': 'Worker node spgpu', 'type': 'node', 'worker_bin_path': '/spshared/apps/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/ssd/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 25383469056, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 1, 'mem': 25386352640, 'name': 'NVIDIA GeForce RTX 4090'}], 'hostname': 'localhost', 'lane': 'default', 'monitor_port': None, 'name': 'localhost', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], 'GPU': [0, 1], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'spuser@localhost', 'title': 'Worker node localhost', 'type': 'node', 'worker_bin_path': '/spshared/apps/cryosparc/cryosparc_worker/bin/cryosparcw'}]
[spuser@spgpu ~]$ cryosparcm status | grep -e HOSTNAME -e BASEPORT
export CRYOSPARC_MASTER_HOSTNAME='localhost'
export CRYOSPARC_HOSTNAME_CHECK='localhost'
[spuser@spgpu ~]$ ps -eo user,pid,ppid,cmd | grep _master
spuser    880048       1 python /spshared/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /spshared/apps/cryosparc/cryosparc_master/supervisord.conf
spuser    880439  880048 python /spshared/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39002 cryosparc_command.command_core:start() -c /spshared/apps/cryosparc/cryosparc_master/gunicorn.conf.py
spuser    880440  880439 python /spshared/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39002 cryosparc_command.command_core:start() -c /spshared/apps/cryosparc/cryosparc_master/gunicorn.conf.py
spuser    880644  880048 python3.10 /spshared/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/flask --app cryosparc_command.command_vis run -h 0.0.0.0 -p 39003 --with-threads
spuser    880725  880048 python /spshared/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39005 -c /spshared/apps/cryosparc/cryosparc_master/gunicorn.conf.py
spuser    880726  880725 python /spshared/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39005 -c /spshared/apps/cryosparc/cryosparc_master/gunicorn.conf.py
spuser    880835  880048 /spshared/apps/cryosparc/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js
spuser    977363  815995 grep --color=auto _master

One thing the web/ip address is different from the host name. But as I mentioned it was all fine Friday and we have been using this machine now for about a year.

Len

wtempel · September 22, 2025, 9:20pm

There seem to be scheduler target records for both spgpu and localhost, which may refer to the same physical computer.
You may want to try if jobs can be submitted after running the command

cryosparcm cli "remove_scheduler_target_node('spgpu')"

[command corrected 2025-10-23]