Jobs halting in launched mode

Hi All,

I re-open this old post.
I have the same problem of jobs halting in launched mode.
We need to do “cryosparcm restart” at least once a day and this is getting problematic.
I have updated to the latest version v4.6.2 but this did not solve the problem.
cryosparc is installed on two machines with master+worker configuration (the simple install).

What should I check ?
Thanks,
GIA

@giax I moved your post to this new topic. Please can you provide additional information:

  1. output of the command
    cryosparcm cli "get_scheduler_targets()"
  2. Project and job IDs for the job that is halted in Launched state.

Hi,

here is the output of that command:

[{‘cache_path’: ‘/scratch/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 25425608704, ‘name’: ‘NVIDIA RTX A5000’}, {‘id’: 1, ‘mem’: 25417023488, ‘name’: ‘NVIDIA RTX A5000’}], ‘hostname’: ‘gbamod25’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘gbamod25’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]}, ‘ssh_str’: ‘cioci@gbamod25’, ‘title’: ‘Worker node gbamod25’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/data/software/cryosparc/cryosparc_worker/bin/cryosparcw’}]

The halted job is P3-J121 (a class2D select job).

Sometimes (but not all the time) a restart job command will do the trick.
Sometimes I need to do a “cryosparcm restart” to wake him up.

Thanks,

GIa

Thanks @giax for this information. Can you recall for which job types, aside from Select 2D, you have observed this behavior?
Please can you also post the output of the command

cryosparcm joblog P3 J121 | tail -n 60

Hi,

sorry for the (very) slow reply.
All the jobs get stuck in launched mode, after a night or at least a few hours of inactivity.
Sometimes, a kill-restart will do the trick but most of the time I am forced to do a cryosparcm restart in order to wake up the program.

If I send that command “cryosparcm joblog P3 J121 | tail -n 60”

I get /usr/bin/env: ‘bash’: Key has expired

Is there a problem with the license key that is getting lost or disconnected ?

Thanks,

GIA

GIA

I do not think this error refers to a CryoSPARC license key. Could the error be related to the authentication of a login or ssh session? Is the cioci user managed by a centralized identity manager?