We are trying to use cryoSPARC Live in an HPC environment with SLURM as the queuing system. Right now we are getting the following errors:
Unable to start session: {u'message': u"OtherError: Command '['sbatch', '/gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J4/queue_sub_script.sh']' returned non-zero exit status 1", u'code': 500, u'data': None, u'name': u'OtherError'}
License is valid.
Launching job on lane bs2 target bs2 ...
Launching job on cluster bs2
====================== Cluster submission script: ======================== ========================================================================== #!/usr/bin/env bash #### cryoSPARC cluster submission script template for SLURM ## Available variables: ## /gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw run --project P2 --job J2 --master_hostname ai-rmlcryoprd1.niaid.nih.gov --master_command_core_port 39002 > /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J2/job.log 2>&1 - the complete command string to run the job ## 0 - the number of CPUs needed ## 1 - the number of GPUs needed. ## Note: the code will use this many GPUs starting from dev id 0 ## the cluster scheduler or this script have the responsibility ## of setting CUDA_VISIBLE_DEVICES so that the job code ends up ## using the correct cluster-allocated GPUs. ## 0.0 - the amount of RAM needed in GB ## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J2 - absolute path to the job directory ## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2 - absolute path to the project dir ## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J2/job.log - absolute path to the log file for the job ## /gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw - absolute path to the cryosparc worker command ## --project P2 --job J2 --master_hostname ai-rmlcryoprd1.niaid.nih.gov --master_command_core_port 39002 - arguments to be passed to cryosparcw run ## P2 - uid of the project ## J2 - uid of the job ## Bryan Hansen - name of the user that created the job (may contain spaces) ## hansenbry@niaid.nih.gov - cryosparc username of the user that created the job (usually an email) ## ## What follows is a simple SLURM script: #SBATCH --job-name cryosparc_P2_J2 #SBATCH -n 0 #SBATCH --gres=gpu:1 #SBATCH -p gpu #SBATCH --mem=0MB #SBATCH --constraint=v100 available_devs="" for devidx in $(seq 0 15); do if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then if [[ -z "$available_devs" ]] ; then available_devs=$devidx else available_devs=$available_devs,$devidx fi fi done export CUDA_VISIBLE_DEVICES=$available_devs /gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw run --project P2 --job J2 --master_hostname ai-rmlcryoprd1.niaid.nih.gov --master_command_core_port 39002 > /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J2/job.log 2>&1 ========================================================================== ==========================================================================
-------- Submission command: sbatch /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J2/queue_sub_script.sh
Failed to launch! 1
We looked into the queue_sub_script.sh
and saw this:
#!/usr/bin/env bash
#### cryoSPARC cluster submission script template for SLURM
## Available variables:
## /gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw run --project P2 --job J2 --master_hostname ai-rmlcryoprd1.niaid.nih.gov --master_command_core_port 39002 > /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J2/job.log 2>&1 - the complete command string to run the job
## 0 - the number of CPUs needed
## 1 - the number of GPUs needed.
## Note: the code will use this many GPUs starting from dev id 0
## the cluster scheduler or this script have the responsibility
## of setting CUDA_VISIBLE_DEVICES so that the job code ends up
## using the correct cluster-allocated GPUs.
## 0.0 - the amount of RAM needed in GB
## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J2 - absolute path to the job directory
## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2 - absolute path to the project dir
## /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J2/job.log - absolute path to the log file for the job
## /gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw - absolute path to the cryosparc worker command
## --project P2 --job J2 --master_hostname ai-rmlcryoprd1.niaid.nih.gov --master_command_core_port 39002 - arguments to be passed to cryosparcw run
## P2 - uid of the project
## J2 - uid of the job
## Bryan Hansen - name of the user that created the job (may contain spaces)
## hansenbry@niaid.nih.gov - cryosparc username of the user that created the job (usually an email)
##
## What follows is a simple SLURM script:
#SBATCH --job-name cryosparc_P2_J2
#SBATCH -n 0
#SBATCH --gres=gpu:1
#SBATCH -p gpu
#SBATCH --mem=0MB
#SBATCH --constraint=v100
available_devs=""
for devidx in $(seq 0 15);
do
if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
if [[ -z "$available_devs" ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
export CUDA_VISIBLE_DEVICES=$available_devs
/gs1/RTS/EM/Software/CryoSPARCv2/cryosparc2_worker/bin/cryosparcw run --project P2 --job J2 --master_hostname ai-rmlcryoprd1.niaid.nih.gov --master_command_core_port 39002 > /gs1/RTS/EM/Processing/marcotrigianoj2-2020/P2/J2/job.log 2>&1
Our question is where in the gui or config for cryoSPARC Live do we set the cpu, gpu, and mem values? The GPU value is always 1 no matter what is selected in the UI and the values of 0 for CPU and Mem we think is the source of the original error.
Thanks for any tips/input that can help us.