Question about job dispatching on a SLURM cluster

Our CryoSPARC software runs on a Slurm cluster. The cluster comprises 5 GPU servers, each equipped with 8 Nvidia RTX3090 graphics cards. CryoSPARC users do not submit tasks directly to a specific GPU server; instead, they submit tasks to the cluster, which are then distributed to a GPU server for computation by the Slurm cluster management software. Currently, the overall operation is satisfactory, but we have encountered the following two issues:

The first issue is that CryoSPARC-Slurm tends to distribute tasks across multiple GPU servers, resulting in each server having ongoing tasks and being unable to accept new ones, even when their computational resources are abundant. For example, if there are 5 CryoSPARC jobs, and each job only requires 2 GPUs, since each GPU server has 8 GPUs, it could theoretically accept 4 jobs. This would allow the 2nd to 5th GPU servers to still accept other tasks. However, currently, CryoSPARC-Slurm assigns tasks to all 5 GPU servers instead of assigning jobs 1 to 4 to the first GPU server. This leads to each server having 6 idle GPUs that cannot accept new tasks. This task scheduling approach does not fully utilize computational resources and instead results in a significant amount of idle and wasted resources. Is there an issue with our software configuration?

The second issue is that when no immediately available computational resources exist upon task submission, CryoSPARC runs squeue to check the queue progress, but an error message appears as follows. Why is this happening?

Cluster job status update for P26 J84 failed with exit code 1 (0 status update request retries)
squeue: error: Unrecognized option: |
Usage: squeue [-A account] [–clusters names] [-i seconds] [–job jobid]
[-n name] [-o format] [–only-job-state] [-p partitions]
[–qos qos] [–reservation reservation] [–sort fields] [–start]
[–step step_id] [-t states] [-u user_name] [–usage]
[-L licenses] [-w nodes] [–federation] [–local] [–sibling]
[–expand-patterns] [–json=data_parser] [–yaml=data_parser]
[-ahjlrsv]

1 Like

@yunch Please can you post the output of the command

cryosparcm cli "get_scheduler_targets()"

Thanks a lot! The output is:

[{‘cache_path’: ‘/scratch/CryoSPARC_Cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: , ‘custom_vars’: {}, ‘desc’: None, ‘hostname’: ‘HY_slurm’, ‘lane’: ‘HY_slurm’, ‘name’: ‘HY_slurm’, ‘qdel_cmd_tpl’: ‘/opt/slurm/24.11.0/bin/scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: ‘/opt/slurm/24.11.0/bin/sinfo’, ‘qstat_cmd_tpl’: ‘/opt/slurm/24.11.0/bin/squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: ‘/opt/slurm/24.11.0/bin/squeue -j {{ cluster_job_id }} --format=%T | sed -n 2p’, ‘qsub_cmd_tpl’: ‘/opt/slurm/24.11.0/bin/sbatch {{ script_path_abs }}’, ‘script_tpl’: ‘#!/bin/sh\n#### cryoSPARC cluster submission script template for SLURM\n## Available variables:\n## {{ run_cmd }} - the complete command string to run the job\n## {{ num_cpu }} - the number of CPUs needed\n## {{ num_gpu }} - the number of GPUs needed.\n## Note: the code will use this many GPUs starting from dev id 0\n## the cluster scheduler or this script have the responsibility\n##
of setting CUDA_VISIBLE_DEVICES so that the job code ends up\n## using the correct cluster-allocated GPUs.\n## {{ ram_gb }} - the amount of RAM needed in GB\n## {{ job_dir_abs }} - absolute path to the job directory\n## {{ project_dir_abs }} - absolute path to the project dir\n## {{ job_log_path_abs }} - absolute path to the log file for the job\n## {{ worker_bin_path }} - absolute path to the cryosparc worker command\n## {{ run_args }} - arguments to be passed to cryosparcw run\n## {{ project_uid }} - uid of the project\n## {{ job_uid }} - uid of the job\n## {{ job_creator }} - name of the user that created the job (may contain spaces)\n## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email)\n## {{ job_type }} - CryoSPARC job type\n##\n## What follows is a simple SLURM script:\n\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH -n {{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --partition=HY\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n\n{{ run_cmd }}\n\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘HY_slurm’, ‘tpl_vars’: [‘cluster_job_id’, ‘command’, ‘run_args’, ‘job_uid’, ‘cryosparc_username’, ‘worker_bin_path’, ‘ram_gb’, ‘job_creator’, ‘num_gpu’, ‘job_dir_abs’, ‘project_uid’, ‘job_type’, ‘project_dir_abs’, ‘num_cpu’, ‘run_cmd’, ‘job_log_path_abs’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/hy003/Software/CryoSPARC_Yunlab/cryosparc_worker/bin/cryosparcw’}]

The | may be problematic in this context. If the pipe functionality is strictly required you may try

  1. wrapping the piped sequence in a shell script,
  2. making the shell script executable and
  3. defining qstat_code_cmd_tpl in terms of the shell script and {{ cluster_job_id }}.

Thank you very much!

However, ‘qstat_code_cmd_tpl’: ‘/opt/slurm/24.11.0/bin/squeue -j {{ cluster_job_id }} --format=%T | sed -n 2p’ actually is from the CryoSPARC v4.6.2 package (but not from our own script). Once we
tar zxfv cryosparc_master.tar.gz and cd cryosparc_master/bin
$ grep squeue *
we see
cryosparcm: “qstat_cmd_tpl” : “squeue -j {{ cluster_job_id }}”,
cryosparcm: “qstat_code_cmd_tpl”: “squeue -j {{ cluster_job_id }} --format=%T | sed -n 2p”,

I am not sure what I should do to this line in cryocparcm.

By the way, although the "squeue: error: Unrecognized option: | " is annoying, it seems not to interfere with the calculation. I think the first problem is a real problem –

The first issue is that CryoSPARC-Slurm tends to distribute tasks across multiple GPU servers, resulting in each server having ongoing tasks and being unable to accept new ones, even when their computational resources are abundant. For example, if there are 5 CryoSPARC jobs, and each job only requires 2 GPUs, since each GPU server has 8 GPUs, it could theoretically accept 4 jobs. This would allow the 2nd to 5th GPU servers to still accept other tasks. However, currently, CryoSPARC-Slurm assigns tasks to all 5 GPU servers instead of assigning jobs 1 to 4 to the first GPU server. This leads to each server having 6 idle GPUs that cannot accept new tasks. This task scheduling approach does not fully utilize computational resources and instead results in a significant amount of idle and wasted resources. Is there an issue with our software configuration?

Could you please give us some instruction about how to optimize the job dispatching?

Dear wtempel,

Thank you verymuch for your kind help!

I think the two issues have been resovled. The problems seem to stem from the bad cluster_info.json and cluster_script.sh files. I have tried the following and solved all the problems.

  1. I deleted the old installation, and re-install CryoSPARC v4.6.2
  2. Instead of using the old cluster_info.json and cluster_script.sh (I do not remember where I got them), I just use the sample files from your web-site (CryoSPARC Cluster Integration Script Examples | CryoSPARC Guide)

cluster_info.json [one more line should be added to this file: “cache_path” : “/scratch/CryoSPARC_Cache”,]
{
“name”: “slurm-lane1”,
“worker_bin_path”: “/path/to/cryosparc_worker/bin/cryosparcw”,
“send_cmd_tpl”: “{{ command }}”,
“qsub_cmd_tpl”: “/opt/slurm/bin/sbatch {{ script_path_abs }}”,
“qstat_cmd_tpl”: “/opt/slurm/bin/squeue -j {{ cluster_job_id }}”,
“qdel_cmd_tpl”: “/opt/slurm/bin/scancel {{ cluster_job_id }}”,
“qinfo_cmd_tpl”: “/opt/slurm/bin/sinfo”
}

cluster_script.sh
#!/usr/bin/env bash

#SBATCH --job-name cryosparc_{{ project_uid }}{{ job_uid }}
#SBATCH --cpus-per-task={{ num_cpu }}
#SBATCH --gres=gpu:{{ num_gpu }}
#SBATCH --mem={{ ram_gb|int }}G
#SBATCH --comment=“created by {{ cryosparc_username }}”
#SBATCH --output={{ job_dir_abs }}/{{ project_uid }}
{{ job_uid }}slurm.out
#SBATCH --error={{ job_dir_abs }}/{{ project_uid }}
{{ job_uid }}_slurm.err

{{ run_cmd }}

Now everything works fine!