Unexpected '/' or char u'@' when runing job

kortal · May 8, 2020, 10:14pm

Hi,

I came across a problem when I run the second step “patch_motion_correction_multi” on a slurm cluster. When I select the lane and create, there’s a prompt "unexpected ‘/’ " on the screen. Only this prompt appears. I guess it’s a problem related to “cluster_info.json” file, but I check many times without finding any mistakes.

Here’s the overview of a job. No errors are shown at the end of this document. And the job is now launched, but don’t have any further response.

Does anybody know where’s the mistake?

Thank you.

License is valid.

Launching job on lane slurmcluster target slurmcluster …

Launching job on cluster slurmcluster

Cluster submission script:

!/usr/bin/env bash
What follows is a simple SLURM script:

#SBATCH --account=def-user
#SBATCH --job-name cryosparc_P1_J2
#SBATCH -n 6
#SBATCH --gres=gpu:1
#SBATCH -p gpu
#SBATCH --mem=16000MB
#SBATCH -o /home/user/cryosparc/project/P1/J2
#SBATCH -e /home/user/cryosparc/project/P1/J2

nvidia-smi

available_devs=“”
for devidx in (seq 0 15); do if [[ -z (nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
if [[ -z “$available_devs” ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
export CUDA_VISIBLE_DEVICES=$available_devs

/home/user/cryosparc/cryosparc2_worker/bin/cryosparcw run --project P1 --job J2 --master_hostname graham.computecanada.ca --master_command_core_port 22002 > /home/user/cryosparc/project/P1/J2/job.log 2>&1

And this is cluster_info.json file:

{
“qdel_cmd_tpl”: “scancel {{ cluster_job_id }}”,
“worker_bin_path”: “/home/user/cryosparc/cryosparc2_worker/bin/cryosparcw”,
“title”: “slurmcluster”,
“cache_path”: “/scratch/user/cryosparcsave/”,
“qinfo_cmd_tpl”: “sinfo”,
“qsub_cmd_tpl”: “sbatch {{ /home/user/cryosparc/cryosparc2_worker/cluster_script.sh }}”,
“qstat_cmd_tpl”: “squeue -j {{ cluster_job_id }}”,
“cache_quota_mb”: null,
“send_cmd_tpl”: “ssh loginnode {{ user@graham.computecanada.ca }}”,
“cache_reserve_mb”: 10000,
“name”: “slurmcluster”
}

dluque · June 11, 2020, 8:36am

Dear Kortal,

We have found exactly the same problem configuring a SGE cluster.

Did you find a solution?

Thank you!

kortal · June 11, 2020, 1:06pm

Hi @dluque, Yes, we just found that it is still the problem of the cluster_submission_script and cluster_info.json file of slurm cluster. Here’s my cluster_script, it should be helpful to you.

cluster_script.sh

#!/bin/bash

#SBATCH --account=def-supervisor
#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}
#SBATCH --gres=gpu:{{ num_gpu }}
#SBATCH --mem={{ (ram_gb*1000)|int }}M             
#SBATCH --time=01:00:00
#SBATCH --cpus-per-task={{num_cpu}}

available_devs=""
for devidx in $(seq 0 15);
do
    if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
        if [[ -z "$available_devs" ]] ; then
            available_devs=$devidx
        else
            available_devs=$available_devs,$devidx
        fi
    fi
done
export CUDA_VISIBLE_DEVICES=$available_devs

{{ run_cmd }}

This one is cluster_info.json file:

{
    "qdel_cmd_tpl": "scancel {{ cluster_job_id }}", 
    "worker_bin_path": "/home/username/cryosparc/cryosparc2_worker/bin/cryosparcw", 
    "title": "slurmcluster", 
    "cache_path": "/scratch/username/cryosparcsave", 
    "qinfo_cmd_tpl": "sinfo", 
    "qsub_cmd_tpl": "sbatch {{ script_path_abs }}", 
    "qstat_cmd_tpl": "squeue -j {{ cluster_job_id }}", 
    "cache_quota_mb": null, 
    "send_cmd_tpl": "{{ command }}", 
    "cache_reserve_mb": 20000, 
    "name": "multi_gpu"
}

For the problem you have encourtered now, it should be resulted by the error of cluster_info.json file.

dluque · June 11, 2020, 1:24pm

Thank you, Kortal!

I am sure it will be helpful!