Hi all,
Over the weekend, I have been trying to get cryosparc to work well with our local cluster. Right now, my jobs are submitted via the login node and nothing happens after that.
Cluster nodes have a separate worker folder and they are using cuda 10.2
Here is the current configuration:
{
"name": "nogales",
"title": "nogales",
"worker_bin_path": "/cryosparc/worker/cryosparc_cluster/cryosparc_worker/bin",
"send_cmd_tpl": "ssh whale {{ command }}",
"qsub_cmd_tpl": "sbatch {{ script_path_abs }}",
"qstat_cmd_tpl": "squeue -j {{ cluster_job_id }}",
"qdel_cmd_tpl": "scancel {{ cluster_job_id }}",
"qinfo_cmd_tpl": "sinfo",
"cache_path": "/volatile/cryosparc",
"cache_quota_mb": null,
"cache_reserve_mb": 10000
}
#!/usr/bin/env bash
#### cryoSPARC cluster submission script template for SLURM
## Available variables:
## {{ run_cmd }} - the complete command string to run the job
## {{ num_cpu }} - the number of CPUs needed
## {{ num_gpu }} - the number of GPUs needed.
## Note: the code will use this many GPUs starting from dev id 0
## the cluster scheduler or this script have the responsibility
## of setting CUDA_VISIBLE_DEVICES so that the job code ends up
## using the correct cluster-allocated GPUs.
## {{ ram_gb }} - the amount of RAM needed in GB
## {{ job_dir_abs }} - absolute path to the job directory
## {{ project_dir_abs }} - absolute path to the project dir
## {{ job_log_path_abs }} - absolute path to the log file for the job
## {{ worker_bin_path }} - absolute path to the cryosparc worker command
## {{ run_args }} - arguments to be passed to cryosparcw run
## {{ project_uid }} - uid of the project
## {{ job_uid }} - uid of the job
## {{ job_creator }} - name of the user that created the job (may contain spaces)
## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email)
##
## What follows is a simple SLURM script:
#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}
#SBATCH -n {{ num_cpu }}
#SBATCH --gres=gpu:{{ num_gpu }}
#SBATCH -p gpu
#SBATCH --mem={{ (ram_gb*1000)|int }}MB
#SBATCH -o {{ job_dir_abs }}
#SBATCH -e {{ job_dir_abs }}
available_devs=""
for devidx in $(seq 0 15);
do
if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
if [[ -z "$available_devs" ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
export CUDA_VISIBLE_DEVICES=$available_devs
srun {{ run_cmd }}
Here is the actual submit script that’s generated and the corresponding error.
====================== Cluster submission script: ========================
==========================================================================
#!/usr/bin/env bash
#### cryoSPARC cluster submission script template for SLURM
## Available variables:
## /cryosparc/worker/cryosparc_cluster/cryosparc_worker/bin run --project P36 --job J100 --master_hostname albakor --master_command_core_port 39002 > /cryosparc/projects/abhiram/P36/J100/job.log 2>&1 - the complete command string to run the job
## 12 - the number of CPUs needed
## 2 - the number of GPUs needed.
## Note: the code will use this many GPUs starting from dev id 0
## the cluster scheduler or this script have the responsibility
## of setting CUDA_VISIBLE_DEVICES so that the job code ends up
## using the correct cluster-allocated GPUs.
## 32.0 - the amount of RAM needed in GB
## /cryosparc/projects/abhiram/P36/J100 - absolute path to the job directory
## /cryosparc/projects/abhiram/P36 - absolute path to the project dir
## /cryosparc/projects/abhiram/P36/J100/job.log - absolute path to the log file for the job
## /cryosparc/worker/cryosparc_cluster/cryosparc_worker/bin - absolute path to the cryosparc worker command
## --project P36 --job J100 --master_hostname albakor.qb3.berkeley.edu --master_command_core_port 39002 - arguments to be passed to cryosparcw run
## P36 - uid of the project
## J100 - uid of the job
## abhiram - name of the user that created the job (may contain spaces)
## achintangal@berkeley.edu - cryosparc username of the user that created the job (usually an email)
##
## What follows is a simple SLURM script:
#SBATCH --job-name cryosparc_P36_J100
#SBATCH -n 12
#SBATCH --gres=gpu:2
#SBATCH -p gpu
#SBATCH --mem=32000MB
#SBATCH -o /cryosparc/projects/abhiram/P36/J100
#SBATCH -e /cryosparc/projects/abhiram/P36/J100
available_devs=""
for devidx in $(seq 0 15);
do
if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
if [[ -z "$available_devs" ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
export CUDA_VISIBLE_DEVICES=$available_devs
srun /cryosparc/worker/cryosparc_cluster/cryosparc_worker/bin run --project P36 --job J100 --master_hostname albakor --master_command_core_port 39002 > /cryosparc/projects/abhiram/P36/J100/job.log 2>&1
==========================================================================
==========================================================================
-------- Submission command:
ssh whale.qb3.berkeley.edu sbatch /cryosparc/projects/abhiram/P36/J100/queue_sub_script.sh
-------- Cluster Job ID:
650
-------- Queued on cluster at 2021-04-07 17:32:08.466015
-------- Job status at 2021-04-07 17:32:08.824638
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
650 gpu cryospar cryospar PD 0:00 1 (None)
cryosparcm joblog p36 j100
Traceback (most recent call last):
File "/opt/cryosparc-v2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/cryosparc-v2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/cryosparc-v2/cryosparc2_master/cryosparc_compute/client.py", line 86, in <module>
print(eval("cli."+command))
File "<string>", line 1, in <module>
File "/opt/cryosparc-v2/cryosparc2_master/cryosparc_compute/client.py", line 59, in func
assert False, res['error']
AssertionError: {'code': 500, 'data': None, 'message': "OtherError: argument of type 'NoneType' is not iterable", 'name': 'OtherError'}
I’d appreciate any pointers to this. Once I am back home today, I will try manually submitting the script as cryosparc user and see what’s going on.
Thanks!