Hi.
target is to use cryosparc on cluster with slurm. Following the installation manual I installed master on login node, started cryosparcm, created admin user, installed worker also on login node, created cluster_info.json and cluster_script.sh, connected cluster, can see lane and verify that cluster is visible but jobs always used only master node (login). Full path to installation is shared to all compute nodes. For the first test no SSD, CUDA is visible everywhere. What is missing or what I’m doing wrong?
[cryosparc@login home]$ cryosparcm cli “verify_cluster(‘test’)”
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
default* up 1-00:00:00 20 idle node[01-20]
[cryosparc@login home]$ cryosparcm cli “get_scheduler_lanes()”
[{u’title’: u’Lane test (cluster)’, u’type’: u’cluster’, u’name’: u’test’, u’desc’: u’’}]
[cryosparc@login home]$ cryosparcm cli “get_worker_nodes()”
[]
[cryosparc@login home]$ cryosparcm cli “get_scheduler_target_cluster_info(‘test’)”
{
“qdel_cmd_tpl”: “scancel {{ cluster_job_id }}”,
“worker_bin_path”: “/home/cryosparc/cryosparc2_worker/bin”,
“title”: “test”,
“cache_path”: “”,
“qinfo_cmd_tpl”: “sinfo”,
“qsub_cmd_tpl”: “sbatch {{ script_path_abs }}”,
“qstat_cmd_tpl”: “squeue -j {{ cluster_job_id }}”,
“send_cmd_tpl”: “{{ command }}”,
“name”: “test”
}
[cryosparc@login home]$ cryosparcm cli “get_scheduler_target_cluster_script(‘test’)”
#!/bin/bash
#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}
#SBATCH --output={{ job_log_path_abs }}
#SBATCH --error={{ job_log_path_abs }}
#SBATCH --ntasks={{ num_cpu }}
#SBATCH --mem={{ (ram_gb*1000)|int }}M
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:{{ num_gpu }}
#SBATCH --gres-flags=enforce-binding
srun {{ run_cmd }}
[cryosparc@login home]$
Log during launch of job:
License is valid.
Running job on master node
Project P1 Job J1 Started
Master running v2.9.0, worker running v2.9.0
Running on lane test
Resources allocated:
Worker: login.my.domain.here
Also in logs:
---------- Scheduler running ---------------
Lane test cluster : Jobs Queued (nonpaused, inputs ready): [u’J1’]
Now trying to schedule J1
Need slots : {}
Need fixed : {}
Need licen : False <-- my comment: is this correct?
Master direct : True
---- Running project UID P1 job UID J1
License Data: {“token”: “here-is-my-license”, “token_valid”: true, “request_date”: NUMBERS, “license_valid”: true}
License Signature: MANYNUMBERS
Running job on master node directly
Running job using: /home/cryosparc/cryosparc2_master/bin/cryosparcm
Changed job P1.J1 status launched
---------- Scheduler done ------------------
Changed job P1.J1 status started
Changed job P1.J1 status running
Changed job P1.J1 status completed
[cryosparc@login home]$ cryosparcm status
CryoSPARC System master node installed at
/home/cryosparc/cryosparc2_master
Current cryoSPARC version: v2.9.0
cryosparcm process status:
command_core RUNNING pid 39834, uptime 1:25:12
command_proxy RUNNING pid 40246, uptime 1:25:07
command_vis RUNNING pid 40126, uptime 1:25:08
database RUNNING pid 39642, uptime 1:25:15
watchdog_dev STOPPED Not started
webapp RUNNING pid 41126, uptime 1:25:05
webapp_dev STOPPED Not started
global config variables:
export CRYOSPARC_LICENSE_ID=“my-license-here”
export CRYOSPARC_MASTER_HOSTNAME=“login.my.domain.here”
export CRYOSPARC_DB_PATH="/home/cryosparc/cryosparc_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
Current cryoSPARC version: v2.9.0 The same problem was on v2.8.3
Thanx for any help.