Installation on SGE cluster

Hello,

We are installing cryoSPARC on a SGE cluster. When we try to launch jobs to the working nodes we receive this error:

Unexpected "/"

In the associated log we find:

*-----------------------------------------------------*
*[JSONRPC ERROR 2020-06-10 20:11:58.603443 at enqueue_job ]*
*-----------------------------------------------------*
*Traceback (most recent call last):*
*File "cryosparc2_command/command_core/__init__.py", line 115, in wrapper*
*res = func(*args, **kwargs)*
*File "cryosparc2_command/command_core/__init__.py", line 4585, in enqueue_job*
*scheduler_run()*
*File "cryosparc2_command/command_core/__init__.py", line 124, in wrapper*
*raise e*
*TemplateSyntaxError: unexpected '/'*
*-----------------------------------------------------*

We think it is related to our configuration. These are our cluster_info.json and cluster_script.sh

cluster_info.json:

{
    "name" : "gpu.q_2gpu",
    "worker_bin_path" : "/opt/cryosparc/cryosparc2/bin/cryosparcw",
    "cache_path" : "/scratch_local",
    "send_cmd_tpl" : "{{ command }}",
    "qsub_cmd_tpl" : "qsub {{ /cryosparc2_master/templates_cluster/oge/gpu.q_2gpu/cluster_script.sh }}",
    "qstat_cmd_tpl" : "qstat -j {{ cluster_job_id }}",
    "qdel_cmd_tpl" : "qdel {{ cluster_job_id }}",
    "qinfo_cmd_tpl" : "qstat"
}

cluster_script.sh (some unsuccessful test are commented with ##)

#!/bin/bash

### Shell
#$ -S /bin/bash

### Use the current working directory
#$ -cwd
##$ -cw {{ job_dir_abs }}
##$ -cw {{ project_dir_abs }}

## Job Name
#$ -N cryosparc_{{ project_uid }}_{{ job_uid }}

## Queue
#$ -q gpu.q

## Parallel Environment & Number of CPUs (select 1 CPU always, and oversubscribe as GPU is per core value)
#$ -pe 2gpu {{ num_cpu }}

## Memory per CPU core
##$ -l m_mem_free={{ (ram_gb)|int }}G

## GPUs 
#$ -l gpu=1
##$ -l gpu={{ num_gpu }}

## Merge stdin and stdout
#$ -j y

## Stdout
#$ -o {{ job_dir_abs }}/{{ project_uid }}.log


## Send mail
#$ -M {{ cryosparc_username }}
#$ -m esa

## Inherit all current environment variables
#$ -V

## Number of threads
export OMP_NUM_THREADS={{ num_cpu }}

echo "HOSTNAME: $HOSTNAME"

{{ run_cmd }}

What we could change/try?

Thanks in advance!

Daniel

Hi @dluque, I believe the issue is here:

Remove the {{ }} from around the path:

"qsub_cmd_tpl" : "qsub /cryosparc2_master/templates_cluster/oge/gpu.q_2gpu/cluster_script.sh",

And try again. Let me know how that works.

Hi,

Thanks for your prompt answer

We tried it but it didn’t work. :frowning:

However we have solve the problem using directly the cryosparc variable

"qsub_cmd_tpl" : "qsub {{ script_path_abs }}",

After overcome this issue, we have found a new problem. The job is properly launched but the program is picking wrongly the qsub job id. We received this error:

[JSONRPC ERROR  2020-06-11 17:19:02.318096  at  enqueue_job ]
-----------------------------------------------------
Traceback (most recent call last):
  File "cryosparc2_command/command_core/__init__.py", line 115, in wrapper
    res = func(*args, **kwargs)
  File "cryosparc2_command/command_core/__init__.py", line 4585, in enqueue_job
    scheduler_run()
  File "cryosparc2_command/command_core/__init__.py", line 124, in wrapper
    raise e
CalledProcessError: Command '['qstat', '-j', 'submitted']' returned non-zero exit status 1

We think that this “submited” instead of the job id comes from the message that our system return us when a qsub is launched. i.e:

Your job 387340 (“sleep”) has been submitted

We think (not sure) that the program pick “submited” instead of “387340”

Any suggestion?

Thanks!

Hi @dluque, apologies for the delay. You’re absolutely right, cryoSPARC does expect the cluster job ID to be at the very end of the command output. We may make this more flexible in future releases of cryoSPARC. For now, can you try adjusting your cluster submission script to filter for only the numeric job ID? I believe you can do so by changing the last line of cluster_script.sh to this:

{{ run_cmd }} | egrep -o '[0-9]+'

Let me know how that goes

Hi @nfrasser, thanks for your answer.

I have tried it and it does not work. I think that the problem it is that we are trying to capture with a regular expression a block of numbers from the output that generates the execution of the cryosparc command that is embedded in the script that is passed to qsub.

However, I believe that the variable “cluster_job_id” stores the value that occurs in the output of the execution of the qsub command.

/ opt / cryosparc / cryosparc2 / bin / cryosparcw run --project P6 --job J20 --master_hostname pepix.isciii.es --master_command_core_port 39002> /processing_Data/microscopia_electronica/Pruebas_CryoSparcs/P6/J20/job.log 2> & 1 | egrep -o '[0-9] +'

Hi @nfrasser,

We have fixed the problem using the -terse option of qsub.

It causes the qsub to display only the job-id of the job being submitted rather than the regular “Your job …” string.

Thanks!

2 Likes

I ran into this on a slurm cluster. We include a --cluster parameter. Unfortunately this means that the submission command does not end with the job id. Example:

 $ sbatch --cluster=gpu-cluster  test.sh
 Submitted batch job 3204 on cluster gpu-cluster

I’m not aware of any sbatch flags equivalent to qsub’s -terse.

I tried an approach similar to @nfrasser’s. However the jobid is parsed from the last word of the qsub_cmd_tpl standard out. Thus the grep command needs to go inside that, rather than inside the cluster_script.sh

Pipes don’t seem to work directly in qsub_cmd_tpl, so I had to use this somewhat ugly submission command:

"qsub_cmd_tpl": "bash -c 'sbatch --cluster=gpu-cluster \"{{ script_path_abs }}\" | sed -r \"s/ on cluster.*$//\"'",

In the end it works, but it would be very nice if cryosparc implemented a cleaner way of parsing the jobid. Maybe a new variable in cluster_info.json which accepts the qsub output and should return the jobid. E.g. the default behavior would be something like

"jobid_cmd_tmp": "awk '{print $NF}'",

This would have the advantage of documenting the jobid parsing behavior, which I had to figure out via trial and error.

Another solution might be to support pipes in the templates.