The Job halts at launch state and does not changing to running

Hi,
I recently updated CryoSPARC to version 4.5.1. However, when I attempt to run any job, it remains stuck in the “launch” state and does not transition to “running.” I have reviewed troubleshooting tips on the CryoSPARC discussion forum: (Job Halts in Launched State), but I have not been able to resolve the issue. Could someone assist me in identifying and fixing the problem?

Thank you.
Rajiv

Welcome to the forum @rajivdbt .

Please can you let us know:

  1. Did job run properly before the update?
  2. Could the update of the cryosparc_worker/ directory have failed or been missed?
  3. What are the outputs of the following commands on the CryoSPARC master host:
cryosparcm cli "get_scheduler_targets()"
cryosparcm cli "get_project_dir_abs('P99')" # substitute actual project UID

Hi wtempel,

Thanks for the response.

  1. Yes, jobs were run successfully before the update.
  2. I am not entirely sure about it
  3. Output of cryosparcm cli “get_scheduler_targets()”
[{'cache_path': '/ssd/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11546263552, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'spgpu', 'lane': 'default', 'monitor_port': None, 'name': 'spgpu', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'spuser@spgpu', 'title': 'Worker node spgpu', 'type': 'node', 'worker_bin_path': '/home/spuser/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/ssd/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11546263552, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11546394624, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'localhost', 'lane': 'default', 'monitor_port': None, 'name': 'localhost', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'spuser@localhost', 'title': 'Worker node localhost', 'type': 'node', 'worker_bin_path': '/home/spuser/cryosparc/cryosparc_worker/bin/cryosparcw'}]


Output of cryosparcm cli "get_project_dir_abs('P8')"

/run/media/spuser/hDrive2_Rajiv/CS-20220215-P8

There are two target nodes defined: localhost and spgpu.

  1. Do these targets refer to one and the same computer?
  2. What are the outputs of these commands on the CryoSPARC master:
    hostname -f
    host spgpu
    cryosparcm status | grep HOSTNAME
    host localhost
    
    ?
  3. What are the outputs of these command on spgpu:
    ls -ld /run/media/spuser/hDrive2_Rajiv/CS-20220215-P8
    stat -f /run/media/spuser/hDrive2_Rajiv/CS-20220215-P8
    cat /home/spuser/cryosparc/cryosparc_worker/version
    
    ?

Thanks again.

  1. spuser is a user account on spgpu (workstation)

cmd: hostname -f
Output: spgpu

cmd: host spgpu
Output: Host spgpu not found: 3(NXDOMAIN)

cmd: cryosparcm status | grep HOSTNAME
Output: export CRYOSPARC_MASTER_HOSTNAME=“spgpu”

cmd: host localhost
Output: localhost has address 127.0.0.1

cmd: ls -ld /run/media/spuser/hDrive2_Rajiv/CS-20220215-P8
Output: drwxrwxrwx. 1 spuser spuser 196608 May 16 18:53 /run/media/spuser/hDrive2_Rajiv/CS-20220215-P8

cmd: stat -f /run/media/spuser/hDrive2_Rajiv/CS-20220215-P8
Output: File: “/run/media/spuser/hDrive2_Rajiv/CS-20220215-P8”
ID: 0 Namelen: 255 Type: fuseblk
Block size: 4096 Fundamental block size: 4096
Blocks: Total: 3418095103 Free: 64081444 Available: 64081444
Inodes: Total: 257013904 Free: 256443284

cmd: cat /home/spuser/cryosparc/cryosparc_worker/version
Output: v4.2.1

The master and worker version is not same. worker version would have failed to update

You may be able to update the cryosparc_worker/ software by adapting the steps described in Update from 4.2 to 4.5.1 failed - #4 by wtempel.

There are additional potential problems:

  1. Can jobs on this configuration properly communicate with CryoSPARC master processes? What is the output of the command (inside a fresh shell)
    eval $(cryosparcm env) # load CryoSPARC environment
    curl ${CRYOSPARC_MASTER_HOSTNAME}:${CRYOSPARC_COMMAND_CORE_PORT}
    exit # exit the shell
    
  2. Is the fuseblk filesystem that holds the project directory compatible with CryoSPARC? Does it support symbolic links?
  3. The list of target nodes may need to be deduplicated. Targets can be removed with the
    remove_scheduler_target_node() cli function.