LSF Cluster: Unable to recognize job id

Cryosparc is unable to understand the jobid.

Our clusters are configured in LSF and it shows the job id and the queue name for every job submitted. Something like below
Job <17909169> is submitted to queue .
Now cryosparc identifies as the job ID due to which I am not able to query job or kill job

Thanks
Neeraj

Hi @neeraj,

Is it possible if you can share your cluster_info.json and cluster_submission.sh with us?
You can run the command cryosparcm cluster dump to get these files written to the current working directory.

cluster_info.json

{
    "qdel_cmd_tpl": " /admin/lsflilac/lsf/10.1/linux3.10-glibc2.17-x86_64/bin/bkill {{ cluster_job_id }}",
    "worker_bin_path": "/opt/common/cryosparc/cryosparc2_worker/bin/cryosparcw",
    "title": "lilac",
    "cache_path": "/scratch",
    "qinfo_cmd_tpl": "/admin/lsflilac/lsf/10.1/linux3.10-glibc2.17-x86_64/bin/bqueue",
    "qsub_cmd_tpl": "/admin/lsflilac/lsf/10.1/linux3.10-glibc2.17-x86_64/bin/bsub",
    "qstat_cmd_tpl": "/admin/lsflilac/lsf/10.1/linux3.10-glibc2.17-x86_64/bin/bjobs -l {{ cluster_job_id }}",
    "cache_quota_mb": null,
    "send_cmd_tpl": "{{ command }}",
    "cache_reserve_mb": 10000,
    "name": "lilac"
}

cluster_submit.sh

#!/bin/bash

#BSUB -J cryosparc_{{ project_uid }}_{{ job_uid }}
#BSUB -q gpuqueue
#BSUB -e {{ job_dir_abs }}/%J.err
#BSUB -o {{ job_dir_abs }}/%J.out
#BSUB -n 8
#BSUB -R "span[ptile=8]"
#BSUB -R "rusage[mem={{ ram_gb }}]"
#BSUB -gpu "num=2:j_exclusive=yes:mode=shared"
#BSUB -W 36:00
#BSUB -m lp-gpu ls-gpu lt-gpu

##Load modules

Hi @neeraj,

Thanks. This is a bug in cryoSPARC that will be fixed in the next update, which will be released soon. Sorry for any inconvenience in the meantime- you will have to run these commands manually.

Thank you stephan. Is there a timeline estimate for the release of the next update ? Also I want to bring to your attention that cryosparc does not support to like the < (redirect operator) which is kind of the LSF way of submitting jobs (bsub < script.sh)

Thank you
Neeraj

Hi @neeraj,

At the moment we don’t have an exact date, but we should be able to deploy in less than 3 weeks. Also, thanks for bringing this to our attention. Is it possible if you can explain how you currently get around this?

Thats great … For the redirect operator issue my co-worker and I reviewed the command core error and based on that we made some minor hacks to get it to work

cryosparc/cryosparc2_master/cryosparc2_command/command_core/init.py

The following change was made

try:
    with open(script_path_abs) as f:
        res = subprocess.check_output(shlex.split(cmd), stderr=subprocess.STDOUT, stdin=f)

Thanks
Neeraj

Hi @neeraj,

That’s exactly what the fix would be. The only difference is that we’d also now create new control blocks for different schedulers, since this part of the code assumed all schedulers supported specifying a bash script as an argument.

As for your original issue, since it seems like you’re comfortable with modifying the code, change the lines in the same file, same function, from:

res = res.strip().split()[-1] # take the last token (to support SLURM)
job_send_streamlog(project_uid, job_uid, "-------- Cluster Job ID: \n%s" % res)
job_send_streamlog(project_uid, job_uid, "-------- Queued on cluster at %s" % str(datetime.datetime.now()))
update_job(project_uid, job_uid, {'cluster_job_id' : res})

to

# Find numeric substrings that may represent the submitted job ID
cluster_job_matches = re.findall('\d+', res)
if len(cluster_job_matches) == 1:
    cluster_job_id = cluster_job_matches[0]  # take the only numeric substring
else:
    cluster_job_id = res.strip().split()[-1]  # take the last token

job_send_streamlog(project_uid, job_uid, "-------- Cluster Job ID: \n%s" % cluster_job_id)
job_send_streamlog(project_uid, job_uid, "-------- Queued on cluster at %s" % str(datetime.datetime.now()))

update_job(project_uid, job_uid, {'cluster_job_id' : cluster_job_id})

When we end up releasing an update to cryoSPARC, it will include all of these changes and you won’t have to modify anything in this file again.

Thank you Stephan. That worked for us. Again thanks for the patch. I will wait for the update.

Regards
Neeraj

Hi @neeraj,

Patch 201027 is out for v2.15.2-live_privatebeta
Release notes:

- Improved cluster submission script execution strategy
- Improved cluster submission job ID identification

To apply the patch to your instance, first update to cryoSPARC Live v2.15.2-live_privatebeta (if you have access), then follow the instructions below:
https://guide.cryosparc.com/setup-configuration-and-management/software-updates#apply-patches

If you get to install the patch please let me know if everything is working as intended- you should be able to add the redirect input operator back to your cluster_info.json

I am not sure if I have access to v2.15.2-live_privatebeta when I try to pull the version I get gzip: stdin: unexpected end of file . How can I request access to the private beta ?

Thanks
neeraj

Hi @neeraj,

Send an email to [address redacted] requesting access and you should be good to go!
More info here

1 Like