Running jobs on cluster. configuration problem?

turnik · June 26, 2019, 12:31pm

Hi.

target is to use cryosparc on cluster with slurm. Following the installation manual I installed master on login node, started cryosparcm, created admin user, installed worker also on login node, created cluster_info.json and cluster_script.sh, connected cluster, can see lane and verify that cluster is visible but jobs always used only master node (login). Full path to installation is shared to all compute nodes. For the first test no SSD, CUDA is visible everywhere. What is missing or what I’m doing wrong?

[cryosparc@login home]$ cryosparcm cli “verify_cluster(‘test’)”
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
default* up 1-00:00:00 20 idle node[01-20]

[cryosparc@login home]$ cryosparcm cli “get_scheduler_lanes()”
[{u’title’: u’Lane test (cluster)’, u’type’: u’cluster’, u’name’: u’test’, u’desc’: u’’}]

[cryosparc@login home]$ cryosparcm cli “get_worker_nodes()”
[]

[cryosparc@login home]$ cryosparcm cli “get_scheduler_target_cluster_info(‘test’)”
{
“qdel_cmd_tpl”: “scancel {{ cluster_job_id }}”,
“worker_bin_path”: “/home/cryosparc/cryosparc2_worker/bin”,
“title”: “test”,
“cache_path”: “”,
“qinfo_cmd_tpl”: “sinfo”,
“qsub_cmd_tpl”: “sbatch {{ script_path_abs }}”,
“qstat_cmd_tpl”: “squeue -j {{ cluster_job_id }}”,
“send_cmd_tpl”: “{{ command }}”,
“name”: “test”
}

[cryosparc@login home]$ cryosparcm cli “get_scheduler_target_cluster_script(‘test’)”
#!/bin/bash
#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}
#SBATCH --output={{ job_log_path_abs }}
#SBATCH --error={{ job_log_path_abs }}
#SBATCH --ntasks={{ num_cpu }}
#SBATCH --mem={{ (ram_gb*1000)|int }}M
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:{{ num_gpu }}
#SBATCH --gres-flags=enforce-binding

srun {{ run_cmd }}

[cryosparc@login home]$

Log during launch of job:
License is valid.
Running job on master node
Project P1 Job J1 Started
Master running v2.9.0, worker running v2.9.0
Running on lane test
Resources allocated:
Worker: login.my.domain.here

Also in logs:
---------- Scheduler running ---------------
Lane test cluster : Jobs Queued (nonpaused, inputs ready): [u’J1’]
Now trying to schedule J1
Need slots : {}
Need fixed : {}
Need licen : False <-- my comment: is this correct?
Master direct : True
---- Running project UID P1 job UID J1
License Data: {“token”: “here-is-my-license”, “token_valid”: true, “request_date”: NUMBERS, “license_valid”: true}
License Signature: MANYNUMBERS
Running job on master node directly
Running job using: /home/cryosparc/cryosparc2_master/bin/cryosparcm
Changed job P1.J1 status launched
---------- Scheduler done ------------------
Changed job P1.J1 status started
Changed job P1.J1 status running
Changed job P1.J1 status completed

[cryosparc@login home]$ cryosparcm status

CryoSPARC System master node installed at
/home/cryosparc/cryosparc2_master
Current cryoSPARC version: v2.9.0

cryosparcm process status:

command_core RUNNING pid 39834, uptime 1:25:12
command_proxy RUNNING pid 40246, uptime 1:25:07
command_vis RUNNING pid 40126, uptime 1:25:08
database RUNNING pid 39642, uptime 1:25:15
watchdog_dev STOPPED Not started
webapp RUNNING pid 41126, uptime 1:25:05
webapp_dev STOPPED Not started

global config variables:

export CRYOSPARC_LICENSE_ID=“my-license-here”
export CRYOSPARC_MASTER_HOSTNAME=“login.my.domain.here”
export CRYOSPARC_DB_PATH="/home/cryosparc/cryosparc_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false

Current cryoSPARC version: v2.9.0 The same problem was on v2.8.3

Thanx for any help.

stephan · June 26, 2019, 2:29pm

Hi @turnik,

What job(s) did you try to run? Some jobs only run on the master node (e.g. import jobs, interactive jobs).

Based on your job log, it looks like the job you were trying to run is a job that only runs on the master node (see: Master direct : True). Try running a patch motion correction job.

turnik · June 26, 2019, 3:50pm

Hi Stephan,

thanx a lot. Now it works. This is what I was not aware as a beginner. Is there any way to find out how jobs are configured to think about resources which are needed for cryosparc software on cluster?

stephan · June 26, 2019, 4:05pm

Hi @turnik,

Glad it works! We are currently in the middle of validating benchmarks for cryoSPARC that will allow us to recommend specific hardware configurations. In the meantime, I’d suggest ensuring you have a NVIDIA GPU with at least 11GB of VRAM (GTX 1080Ti, RTX 2080Ti, Tesla V100, etc.), DDR4 RAM (32GB/GPU), a PCIe-based SSD (1TB+) and a CPU with 8+ cores @ 3GHz+ available on your cluster for optimal performance. We will update our website soon with an official hardware recommendation.

Running jobs on cluster. configuration problem?

[cryosparc@login home]$ cryosparcm status

CryoSPARC System master node installed at /home/cryosparc/cryosparc2_master Current cryoSPARC version: v2.9.0

CryoSPARC System master node installed at
/home/cryosparc/cryosparc2_master
Current cryoSPARC version: v2.9.0