Exit status 255 error/

I keep getting the following error when submitting 2D classification jobs:

Command ‘[‘ssh’, u’dgl@dgl’, ‘nohup’, u’/home/dgl/cryosparc2_worker/bin/cryosparcw run --project P1 --job J7 --master_hostname dgl-Precision-7920-Tower --master_command_core_port 39002 > /home/dgl/cryosparc2_projects/PhoPQ K19R SMA/P1/J7/job.log 2>&1 & ‘]’ returned non-zero exit status 255

and the job overview states:

Launching job on lane default target dgl …
License is valid.
Running job on remote worker node hostname dgl
Failed to launch! 255
ssh: Could not resolve hostname dgl: Name or service not known

Here are the configurations:

CRYOSPARC CONNECT --------------------------------------------

Attempting to register worker dgl@localhost to command dgl@localhost:39002
Connecting as unix user dgl
Will register using ssh string: dgl@dgl-Precision-7920-Tower
If this is incorrect, you should re-run this command with the flag --sshstr

Connected to master.

Current connected workers:

Autodetecting available GPUs…
Detected 1 CUDA devices.

id pci-bus name

   0      0000:73:00.0  Quadro P4000

All devices will be enabled now.
This can be changed later using --update

Worker will be registered without SSD.

Autodetecting the amount of RAM available…
This machine has 64.02GB RAM .

Registering worker…

You can now launch jobs on the master node and they will be scheduled
on to this worker node if resource requirements are met.

Final configuration for dgl@localhost
lane : default
name : dgl@localhost
title : Worker node dgl@localhost
resource_slots : {u’GPU’: [0], u’RAM’: [0, 1, 2, 3, 4, 5, 6, 7], u’CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}
hostname : dgl@localhost
worker_bin_path : /home/dgl/cryosparc2_worker/bin/cryosparcw
cache_path : None
cache_quota_mb : None
resource_fixed : {u’SSD’: False}
cache_reserve_mb : 10000
type : node
ssh_str : dgl@dgl-Precision-7920-Tower
desc : None

Hi @Bruk,

It looks like you have multiple workers registered in the default lane:

Current connected workers:

and although you are connecting your dgl-Precision-7920-Tower correctly, the other workers are misconfigured and when you try to launch a job the scheduler is trying to run it on one of the other registered workers and failing.
Try to create a new lane and assign just the correct worker to that lane, and then queue a job to that lane:
cryosparcw connect --master <master_hostname> --worker <worker_hostname> --update --newlane --lane "dgl_lane"
After this in the UI you’ll see a second lane other than default on which you can queue jobs.

I don’t have a cryosparcw command available; just cryosparcm. What are the commands i can use to remove workers and create a new lane with cryosparcm?

Hi @Bruk,

The cryosparcw command is available on the node that hosts the cryosparc2_worker files.


okay, the following command

cd /home/dgl/cryosparc2_worker
bin/cryosparcw connect --worker localhost --master localhost --port 39000 --ssdpath /scratch/cryosparc_cache

gives the following output:

CRYOSPARC CONNECT --------------------------------------------

Attempting to register worker localhost to command localhost:39002
Connecting as unix user dgl
Will register using ssh string: dgl@localhost
If this is incorrect, you should re-run this command with the flag --sshstr

Connected to master.

Current connected workers:

Autodetecting available GPUs…
Detected 1 CUDA devices.

id pci-bus name

   0      0000:73:00.0  Quadro P4000

All devices will be enabled now.
This can be changed later using --update

Traceback (most recent call last):
File “bin/connect.py”, line 197, in
cache_path = check_ssd_path()
File “bin/connect.py”, line 88, in check_ssd_path
assert os.path.isdir(cache_path_expand), “Path %s does not exist.” % args.ssdpath
AssertionError: Path /scratch/cryosparc_cache does not exist.

I see that there are multiple connected workers I would like to disconnect. I also see that there is a cache error even though /home/dgl/scratch/cryosparc_cache exists. Can you let me know how to disconnect the workers and fix the cache issue?

trying cryosparcw connect --master localhost --worker localhost --update --newlane --lane “dgl_lane”
returns the “cryosparcw: command not found” error

never mind, uninstalled and reinstalled everything again.