Stuck in queue forever

Followed your instructions and sent you the log file by email. J70 still waited in the queue.

After turning off the logging and cryosparcm restart, J70 started running.

Thanks for sending the logs. Please can you also post

  1. outputs of the commands
    cryosparcm cli "get_job('P16', 'J69', 'job_type', 'created_at', 'queued_at', 'launched_at', 'started_at', 'completed_at')"
    cryosparcm cli "get_job('P16', 'J70', 'job_type', 'created_at', 'queued_at', 'launched_at', 'started_at', 'completed_at')"
    
  2. the expanded Inputs section of job J70.
cryosparcm cli "get_job('P16', 'J69', 'job_type', 'created_at', 'queued_at', 'launched_at', 'started_at', 'completed_at')"
{'_id': '659d956481a656dde2f4a856', 'completed_at': 'Fri, 19 Jan 2024 21:54:13 GMT', 'created_at': 'Tue, 09 Jan 2024 18:50:12 GMT', 'job_type': 'restack_particles', 'launched_at': 'Fri, 19 Jan 2024 21:49:33 GMT', 'project_uid': 'P16', 'queued_at': 'Fri, 19 Jan 2024 21:49:31 GMT', 'started_at': 'Fri, 19 Jan 2024 21:50:10 GMT', 'uid': 'J69'}
cryosparcm cli "get_job('P16', 'J70', 'job_type', 'created_at', 'queued_at', 'launched_at', 'started_at', 'completed_at')"
{'_id': '659d95ca81a656dde2f4e484', 'completed_at': None, 'created_at': 'Tue, 09 Jan 2024 18:51:54 GMT', 'job_type': 'nonuniform_refine_new', 'launched_at': 'Fri, 19 Jan 2024 21:59:22 GMT', 'project_uid': 'P16', 'queued_at': 'Fri, 19 Jan 2024 21:49:37 GMT', 'started_at': 'Fri, 19 Jan 2024 22:01:23 GMT', 'uid': 'J70'}

Interesting. Please can you also run this command:

cryosparcm cli "get_job('P16', 'J53', 'job_type', 'created_at', 'queued_at', 'launched_at', 'started_at', 'completed_at')"

cryosparcm cli "get_job('P16', 'J53', 'job_type', 'created_at', 'queued_at', 'launched_at', 'started_at', 'completed_at')"
{'_id': '6596bf229906ac8299b73c58', 'completed_at': 'Tue, 02 Jan 2024 03:58:58 GMT', 'created_at': 'Tue, 02 Jan 2024 03:12:59 GMT', 'job_type': 'nonuniform_refine_new', 'launched_at': 'Tue, 02 Jan 2024 03:13:53 GMT', 'project_uid': 'P16', 'queued_at': 'Tue, 02 Jan 2024 03:13:52 GMT', 'started_at': 'Tue, 02 Jan 2024 03:14:06 GMT', 'uid': 'J53'}

@jhzhu We appreciate your efforts in gathering debugging information. Unfortunately, we could not identify the cause of the problem from the logs. It is possible that some additional job(s) whose lower level inputs were required did not complete. We suggest starting processing from scratch in a new project.

OK. I started a new project to test. Just use the “Extensive Validation”. Still have the same problem.

@jhzhu We unfortunately do not what is causing this problem. I understand that currently, cryosparc_master services

  1. run within their own slurm job allocation
  2. launch additional slurm jobs,

which, taken together, constitutes two “layers” of workload management. I wonder whether simplifying workload management would help in either diagnosing or circumventing the problem.

You could try:
Alternative 1. running cryosparc_master processes outside slurm

  • CryoSPARC jobs would be submitted to slurm

Alternative 2. running cryosparc_master processes as a slurm job

  • would need to ensure that cryosparc_master processes would not be interrupted by slurm
  • would need to ensure that cryosparc_master processes are running on a GPU node
  • would need to configure CryoSPARC in single workstation mode
  • would run jobs on the same host as cryosparc_master processes
  • could simplify setup by setting CRYOSPARC_MASTER_HOSTNAME, CRYOSPARC_HOSTNAME_CHECK and the
    cryosparcw connect --worker parameter all set to localhost. (These settings are incompatible for CryoSPARC instance with worker nodes in addition to the cryosparc_master node.)

@jhzhu Did you try any of the suggestion here? Does any of them works on the cluster or do you find any better solution for this after 6 months?

We observed the same issues on our cluster. It only happened in versions after v4.4.

However, interestingly, the “sleeping” queued job will be awakened under two situations from our observation so far:

  • Only once after freshly restarting master with command: cryosparcm restart; but not the second attempt.
  • If multiple parallel jobs queued together, clone and submit one of them will activate the others.

@wtempel Thanks for previous suggestions, but unfortunately, they’re not feasible under our cluster environment. I’m just curious why it works before v4.4 but not versions after that. Thanks for any updates or follow up with this issue as we and perhaps a lot of other users heavily depend on cluster environment.

Welcome to the forum @morganyang422.

Do your CryoSPARC master processes run outside cluster workload management, that is, CryoSPARC jobs are submitted to a cluster-type scheduler lane, but commands like cryosparcm are not run “inside” a cluster job?
Please can you post the output of these commands on the computer where CryoSPARC master processes run:

ls -1 /tmp/mongo*.sock /tmp/cryosparc*.sock
ps -eo pid,ppid,start,command | grep -e cryosparc -e mongo

Thanks @wtempel for the quick reponse. And sorry for any delay but I had to finish a presentation this morning. To answer your questions:
“Do your CryoSPARC master processes run outside cluster workload management, that is, CryoSPARC jobs are submitted to a cluster-type scheduler lane, but commands like cryosparcm are not run “inside” a cluster job?”
No, given the size of our Cryosparc user pool on the cluster, it’s not feasible for us to setup some “login nodes” to host Cryosparc master processes. Both of our Cryosparc master and worker jobs have to be running on compute nodes (cluster jobs). I think probably most HPC would like Cryosparc to run it that way in a cluster environment. And it works just fine before v4.4

“Please can you post the output of these commands on the computer where CryoSPARC master processes run:”
“ls -1 /tmp/mongo*.sock /tmp/cryosparc*.sock”

/tmp/cryosparc-supervisor-e29cccc7c796955a404d28cb9d299db7.sock
/tmp/mongodb-39023.sock

“ps -eo pid,ppid,start,command | grep -e cryosparc -e mongo”

1908613       1   Oct 30 /bin/bash /usr/local/apps/cryosparc/libexec/auto_archive.sh
1916547       1   Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/supervisord.conf
1916662 1916547   Oct 30 mongod --auth --dbpath /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_database --port 39023 --oplogSize 64 --replSet meteor --wiredTigerCacheSizeGB 4 --bind_ip_all
1916775 1916547   Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39024 cryosparc_command.command_core:start() -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916793 1916775   Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39024 cryosparc_command.command_core:start() -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916819 1916547   Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:39025 -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916821 1916547   Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39027 -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916834 1916821   Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39027 -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916835 1916819   Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:39025 -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916898 1916547   Oct 30 /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js
2037000 2036287 15:35:29 grep --color=auto -e cryosparc -e mongo

Let me know if there is anything else I can do to help with the trouble shoot here. THANKS

Interesting. In this case, may I ask you to run those commands again, and a few additional commands

ls -1 /tmp/mongo*.sock /tmp/cryosparc*.sock
ps -eo pid,ppid,start,command | grep -e cryosparc -e mongo
grep HOSTNAME /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/config.sh
/gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/bin/cryosparcm cli "get_scheduler_targets()"
env | grep SLURM_ | grep -e NPROC -e CPUS -e MEM -e TRES

Thanks again for the speedy reply @wtempel. Yes, sure:

“ls -1 /tmp/mongo*.sock /tmp/cryosparc*.sock”
/tmp/cryosparc-supervisor-e29cccc7c796955a404d28cb9d299db7.sock
/tmp/mongodb-39023.sock

“ps -eo pid,ppid,start,command | grep -e cryosparc -e mongo”
1908613 1 Oct 30 /bin/bash /usr/local/apps/cryosparc/libexec/auto_archive.sh
1916547 1 Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/supervisord.conf
1916662 1916547 Oct 30 mongod --auth --dbpath /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_database --port 39023 --oplogSize 64 --replSet meteor --wiredTigerCacheSizeGB 4 --bind_ip_all
1916775 1916547 Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39024 cryosparc_command.command_core:start() -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916793 1916775 Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39024 cryosparc_command.command_core:start() -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916819 1916547 Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:39025 -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916821 1916547 Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39027 -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916834 1916821 Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39027 -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916835 1916819 Oct 30 python /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:39025 -c /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/gunicorn.conf.py
1916898 1916547 Oct 30 /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js
2042350 1910777 19:16:54 grep --color=auto -e cryosparc -e mongo

“grep HOSTNAME /gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/config.sh”
export CRYOSPARC_MASTER_HOSTNAME=“cn1637”

“/gpfs/gsfs12/users/yangr3/apps/cryosparc/cryosparc_master/bin/cryosparcm cli “get_scheduler_targets()””
[{‘cache_path’: None, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: [‘lscratch’, ‘ram_gb_multiplier’, ‘gpu_type’, ‘num_hours’], ‘custom_vars’: {‘gpu_type’: ‘v100x’, ‘lscratch’: ‘460’, ‘num_hours’: ‘5’, ‘ram_gb_multiplier’: ‘1’}, ‘desc’: None, ‘hostname’: ‘slurm_auto’, ‘lane’: ‘slurm_auto’, ‘name’: ‘slurm_auto’, ‘qdel_cmd_tpl’: ‘/usr/local/slurm/bin/scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: “/usr/local/slurm/bin/sinfo --format=‘%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E’”, ‘qstat_cmd_tpl’: ‘/usr/local/bin/sjobs --squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘/usr/local/slurm/bin/sbatch {{ script_path_abs }}’, ‘script_tpl’: ‘#!/bin/bash\n#SBATCH --job-name=cryosparc_{{ project_uid }}{{ job_uid }}\n#SBATCH --output={{ job_dir_abs }}/output.txt\n#SBATCH --error={{ job_dir_abs }}/error.txt\n#SBATCH --mem={{ (ram_gb|float * (ram_gb_multiplier|default(1))|float)|int }}GB\n#SBATCH --time={{ (num_hours|default(4))|int }}:00:00\n#SBATCH --nodes=1\n#SBATCH --ntasks={{ num_cpu }}\n#SBATCH --cpus-per-task=1\n{%- if num_gpu == 0 %}\n#SBATCH --partition=norm\n#SBATCH --gres=lscratch:{{ (lscratch|default(100))|int }}\n{%- else %}\n#SBATCH --partition=gpu\n#SBATCH --gres=lscratch:{{ (lscratch|default(100))|int }},gpu:{{ gpu_type }}:{{ num_gpu }}\n{%- endif %}\n{{ run_cmd }}\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘slurm_auto’, ‘tpl_vars’: [‘num_gpu’, ‘lscratch’, ‘command’, ‘project_uid’, ‘run_cmd’, ‘ram_gb_multiplier’, ‘num_cpu’, ‘job_uid’, ‘gpu_type’, ‘num_hours’, ‘ram_gb’, ‘cluster_job_id’, ‘job_dir_abs’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/data/yangr3/apps/cryosparc/cryosparc_worker/bin/cryosparcw’}, {‘cache_path’: ‘/lscratch/25547743’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: [‘lscratch’, ‘ram_gb_multiplier’, ‘gpu_type’, ‘num_hours’], ‘custom_vars’: {‘gpu_type’: ‘v100x’, ‘lscratch’: ‘460’, ‘num_hours’: ‘5’}, ‘desc’: None, ‘hostname’: ‘slurm_custom’, ‘lane’: ‘slurm_custom’, ‘name’: ‘slurm_custom’, ‘qdel_cmd_tpl’: ‘/usr/local/slurm/bin/scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: “/usr/local/slurm/bin/sinfo --format=‘%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E’”, ‘qstat_cmd_tpl’: ‘/usr/local/bin/sjobs --squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘/usr/local/slurm/bin/sbatch {{ script_path_abs }}’, ‘script_tpl’: '#!/bin/bash\n#SBATCH --job-name=cryosparc{{ project_uid }}{{ job_uid }}\n#SBATCH --output={{ job_dir_abs }}/output.txt\n#SBATCH --error={{ job_dir_abs }}/error.txt\n#SBATCH --mem={{ (ram_gb|float * (ram_gb_multiplier|default(1))|float)|int }}GB\n#SBATCH --time={{ (num_hours|default(4))|int }}:00:00\n#SBATCH --nodes=1\n#SBATCH --ntasks={{ num_cpu }}\n#SBATCH --cpus-per-task=1\n{%- if num_gpu == 0 %}\n#SBATCH --partition=norm\n#SBATCH --gres=lscratch:{{ (lscratch|default(100))|int }}\n{%- else %}\n#SBATCH --partition=gpu\n#SBATCH --gres=lscratch:{{ (lscratch|default(100))|int }},gpu:{{ (gpu_type|default(“p100”)) }}:{{ num_gpu }}\n{%- endif %}\n{{ run_cmd }}\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘slurm_custom’, ‘tpl_vars’: [‘num_gpu’, ‘lscratch’, ‘command’, ‘project_uid’, ‘run_cmd’, ‘ram_gb_multiplier’, ‘num_cpu’, ‘job_uid’, ‘gpu_type’, ‘num_hours’, ‘ram_gb’, ‘cluster_job_id’, ‘job_dir_abs’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/data/yangr3/apps/cryosparc/cryosparc_worker/bin/cryosparcw’}, {‘cache_path’: None, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: [‘lscratch’, ‘ram_gb_multiplier’, ‘gpu_type’, ‘num_hours’], ‘custom_vars’: {}, ‘desc’: None, ‘hostname’: ‘multi_gpu’, ‘lane’: ‘multi_gpu’, ‘name’: ‘multi_gpu’, ‘qdel_cmd_tpl’: ‘/usr/local/slurm/bin/scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: “/usr/local/slurm/bin/sinfo --format=‘%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E’”, ‘qstat_cmd_tpl’: ‘/usr/local/bin/sjobs --squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘/usr/local/slurm/bin/sbatch {{ script_path_abs }}’, ‘script_tpl’: '#!/bin/bash\n#SBATCH --job-name=cryosparc{{ project_uid }}{{ job_uid }}\n#SBATCH --output={{ job_dir_abs }}/output.txt\n#SBATCH --error={{ job_dir_abs }}/error.txt\n#SBATCH --mem={{ (ram_gb|float * (ram_gb_multiplier|default(1))|float)|int }}GB\n#SBATCH --time={{ (num_hours|default(4))|int }}:00:00\n#SBATCH --nodes=1\n#SBATCH --ntasks={{ num_cpu }}\n#SBATCH --cpus-per-task=1\n{%- if num_gpu == 0 %}\n#SBATCH --partition=norm\n#SBATCH --gres=lscratch:{{ (lscratch|default(100))|int }}\n{%- else %}\n#SBATCH --partition=gpu\n#SBATCH --gres=lscratch:{{ (lscratch|default(100))|int }},gpu:{{ (gpu_type|default(“p100”)) }}:{{ num_gpu }}\n{%- endif %}\n{{ run_cmd }}\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘multi_gpu’, ‘tpl_vars’: [‘num_gpu’, ‘lscratch’, ‘command’, ‘project_uid’, ‘run_cmd’, ‘ram_gb_multiplier’, ‘num_cpu’, ‘job_uid’, ‘gpu_type’, ‘num_hours’, ‘ram_gb’, ‘cluster_job_id’, ‘job_dir_abs’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/data/yangr3/apps/cryosparc/cryosparc_worker/bin/cryosparcw’}, {‘cache_path’: None, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: [‘lscratch’, ‘ram_gb_multiplier’, ‘gpu_type’, ‘num_hours’], ‘custom_vars’: {}, ‘desc’: None, ‘hostname’: ‘single_gpu’, ‘lane’: ‘single_gpu’, ‘name’: ‘single_gpu’, ‘qdel_cmd_tpl’: ‘/usr/local/slurm/bin/scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: “/usr/local/slurm/bin/sinfo --format=‘%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E’”, ‘qstat_cmd_tpl’: ‘/usr/local/bin/sjobs --squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘/usr/local/slurm/bin/sbatch {{ script_path_abs }}’, ‘script_tpl’: '#!/bin/bash\n#SBATCH --job-name=cryosparc{{ project_uid }}{{ job_uid }}\n#SBATCH --output={{ job_dir_abs }}/output.txt\n#SBATCH --error={{ job_dir_abs }}/error.txt\n#SBATCH --mem={{ (ram_gb|float * (ram_gb_multiplier|default(1))|float)|int }}GB\n#SBATCH --time={{ (num_hours|default(4))|int }}:00:00\n#SBATCH --nodes=1\n#SBATCH --ntasks={{ num_cpu }}\n#SBATCH --cpus-per-task=1\n#SBATCH --partition=gpu\n#SBATCH --gres=lscratch:{{ (lscratch|default(100))|int }},gpu:{{ (gpu_type|default(“p100”)) }}:1\n{{ run_cmd }}\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘single_gpu’, ‘tpl_vars’: [‘lscratch’, ‘command’, ‘project_uid’, ‘run_cmd’, ‘ram_gb_multiplier’, ‘num_cpu’, ‘job_uid’, ‘gpu_type’, ‘num_hours’, ‘ram_gb’, ‘cluster_job_id’, ‘job_dir_abs’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/data/yangr3/apps/cryosparc/cryosparc_worker/bin/cryosparcw’}, {‘cache_path’: None, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: [‘lscratch’, ‘ram_gb_multiplier’, ‘gpu_type’, ‘num_hours’], ‘custom_vars’: {}, ‘desc’: None, ‘hostname’: ‘mig_gpu’, ‘lane’: ‘mig_gpu’, ‘name’: ‘mig_gpu’, ‘qdel_cmd_tpl’: ‘/usr/local/slurmdev/bin/scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: “/usr/local/slurmdev/bin/sinfo --format=‘%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E’”, ‘qstat_cmd_tpl’: ‘/usr/local/bin/sjobs --squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘/usr/local/slurmdev/bin/sbatch {{ script_path_abs }}’, ‘script_tpl’: '#!/bin/bash\n#SBATCH --job-name=cryosparc{{ project_uid }}_{{ job_uid }}\n#SBATCH --output={{ job_dir_abs }}/output.txt\n#SBATCH --error={{ job_dir_abs }}/error.txt\n#SBATCH --mem={{ (ram_gb|float * (ram_gb_multiplier|default(1))|float)|int }}GB\n#SBATCH --time={{ (num_hours|default(4))|int }}:00:00\n#SBATCH --nodes=1\n#SBATCH --ntasks={{ num_cpu }}\n#SBATCH --cpus-per-task=1\n{%- if num_gpu == 0 %}\n#SBATCH --partition=norm\n#SBATCH --gres=lscratch:{{ (lscratch|default(100))|int }}\n{%- else %}\n#SBATCH --partition=gpu\n#SBATCH --gres=lscratch:{{ (lscratch|default(100))|int }},gpu:{{ (gpu_type|default(“a100_1g.10g”)) }}:1\n{%- endif %}\n{{ run_cmd }}\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘mig_gpu’, ‘tpl_vars’: [‘num_gpu’, ‘lscratch’, ‘command’, ‘project_uid’, ‘run_cmd’, ‘ram_gb_multiplier’, ‘num_cpu’, ‘job_uid’, ‘gpu_type’, ‘num_hours’, ‘ram_gb’, ‘cluster_job_id’, ‘job_dir_abs’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/data/yangr3/apps/cryosparc/cryosparc_worker/bin/cryosparcw’}]

“env | grep SLURM_ | grep -e NPROC -e CPUS -e MEM -e TRES”
###None showed from this command as slurm was run as a module and called when needed:

“env | grep slurm”
__LMOD_REF_COUNT_PATH=/usr/local/apps/cryosparc/sbin:1;/data/yangr3/apps/dutree/bin:1;/usr/local/slurm/bin:1;/usr/local/bin:2;/usr/X11R6/bin:1;/usr/local/jdk/bin:1;/usr/bin:1;/usr/local/sbin:1;/usr/sbin:1;/usr/local/mysql/bin:1;/home/yangr3/.local/bin:1;/home/yangr3/bin:1
MANPATH=/usr/local/slurm/share/man:/usr/local/lmod/lmod/8.7/share/man:
PATH=/data/yangr3/apps/cryosparc/cryosparc_master/bin:/data/yangr3/apps/cryosparc/cryosparc_worker/bin:/data/yangr3/apps/cryosparc/cryosparc_master/bin:/data/yangr3/apps/cryosparc/cryosparc_worker/bin:/usr/local/apps/cryosparc/sbin:/data/yangr3/apps/dutree/bin:/usr/local/slurm/bin:/usr/local/bin:/usr/X11R6/bin:/usr/local/jdk/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/mysql/bin:/home/yangr3/.local/bin:/home/yangr3/bin