@dirk @maxim Do you submit your CryoSPARC jobs to a cluster? If so, please can you
- post the output of the command
cryosparcm cli "get_scheduler_targets()"
- indicate the name of the relevant cluster scheduler lane (as configured inside CryoSPARC)
@dirk @maxim Do you submit your CryoSPARC jobs to a cluster? If so, please can you
cryosparcm cli "get_scheduler_targets()"
Yes, the users submit jobs to a SLURM queuing system with three queues, 2gpu, 4gpu, 6gpu, with the number of available GPUs per SLUM node in the queue name. The abnormally long jobs were running on the 4gpu and 6gpu queues. The longest 2D classification with v4.6.0 ran on the 4gpu queue for ~120h with less than1 Mio particles.
Here is the output of the get_scheduler_targets command:
$ cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/scratch', 'cache_quota_mb': 3400000, 'cache_reserve_mb': 10240, 'custom_var_names': ['slurmnode'], 'custom_vars': {}, 'desc': None, 'hostname': '4gpu', 'lane': '4gpu', 'name': '4gpu', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/bin/bash -e\n#\n# cryoSPARC script for SLURM submission with sbatch\n#\n# 24-01-2022 Dirk Kostrewa Original file\n# 23-05-2023 Dirk Kostrewa Added cluster submission script variable "slurmnode"\n# 01-03-2024 Dirk Kostrewa cgroups: no CUDA_VISIBLE_DEVICES, no --gres-flags=enforce-binding\n# 06-03-2024 Dirk Kostrewa Double memory allocation in "--mem="\n\n#SBATCH --partition=4gpu\n#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --mem={{ (ram_gb*2)|int }}G\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --mail-type=NONE\n#SBATCH --mail-user={{ cryosparc_username }}\n#SBATCH --nodelist={{ slurmnode }}\n\n{{ run_cmd }}\n', 'send_cmd_tpl': '{{ command }}', 'title': 'slurm 4gpu', 'tpl_vars': ['job_log_path_abs', 'cluster_job_id', 'project_uid', 'ram_gb', 'cryosparc_username', 'num_gpu', 'num_cpu', 'run_cmd', 'command', 'slurmnode', 'job_uid'], 'type': 'cluster', 'worker_bin_path': '/home/cryosparc/cryosparc/cryosparc_worker/bin/cryosparcw'},
{'cache_path': '/scratch', 'cache_quota_mb': 3400000, 'cache_reserve_mb': 10240, 'custom_var_names': ['slurmnode'], 'custom_vars': {}, 'desc': None, 'hostname': '6gpu', 'lane': '6gpu', 'name': '6gpu', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/bin/bash -e\n#\n# cryoSPARC script for SLURM submission with sbatch\n#\n# 24-01-2022 Dirk Kostrewa Original file\n# 23-05-2023 Dirk Kostrewa Added cluster submission script variable "slurmnode"\n# 01-03-2024 Dirk Kostrewa cgroups: no CUDA_VISIBLE_DEVICES, no --gres-flags=enforce-binding\n# 06-03-2024 Dirk Kostrewa Double memory allocation in "--mem="\n\n#SBATCH --partition=6gpu\n#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --mem={{ (ram_gb*2)|int }}G\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --mail-type=NONE\n#SBATCH --mail-user={{ cryosparc_username }}\n#SBATCH --nodelist={{ slurmnode }}\n\n{{ run_cmd }}\n', 'send_cmd_tpl': '{{ command }}', 'title': 'slurm 6gpu', 'tpl_vars': ['job_log_path_abs', 'cluster_job_id', 'project_uid', 'ram_gb', 'cryosparc_username', 'num_gpu', 'num_cpu', 'run_cmd', 'command', 'slurmnode', 'job_uid'], 'type': 'cluster', 'worker_bin_path': '/home/cryosparc/cryosparc/cryosparc_worker/bin/cryosparcw'},
{'cache_path': '/scratch', 'cache_quota_mb': 3400000, 'cache_reserve_mb': 10240, 'custom_var_names': ['slurmnode'], 'custom_vars': {}, 'desc': None, 'hostname': '2gpu', 'lane': '2gpu', 'name': '2gpu', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/bin/bash -e\n#\n# cryoSPARC script for SLURM submission with sbatch\n#\n# 24-01-2022 Dirk Kostrewa Original file\n# 23-05-2023 Dirk Kostrewa Added cluster submission script variable "slurmnode"\n# 01-03-2024 Dirk Kostrewa cgroups: no CUDA_VISIBLE_DEVICES, no --gres-flags=enforce-binding\n# 06-03-2024 Dirk Kostrewa Double memory allocation in "--mem="\n\n#SBATCH --partition=2gpu\n#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --mem={{ (ram_gb*2)|int }}G\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --mail-type=NONE\n#SBATCH --mail-user={{ cryosparc_username }}\n#SBATCH --nodelist={{ slurmnode }}\n\n{{ run_cmd }}\n', 'send_cmd_tpl': '{{ command }}', 'title': 'slurm 2gpu', 'tpl_vars': ['job_log_path_abs', 'cluster_job_id', 'project_uid', 'ram_gb', 'cryosparc_username', 'num_gpu', 'num_cpu', 'run_cmd', 'command', 'slurmnode', 'job_uid'], 'type': 'cluster', 'worker_bin_path': '/home/cryosparc/cryosparc/cryosparc_worker/bin/cryosparcw'}]
@dirk Thanks for sharing the scheduler target information. Please can you also let us know
cat /sys/kernel/mm/transparent_hugepage/enabled
on your cluster worker nodes#SBATCH --cpus-per-task=
parameterWe are still looking into this. One other thing which anyone encountering this issue could do to help us is to share your /etc/slurm/slurm.conf
and /etc/slurm/cgroup.conf
files with us (or the corresponding ones, if located elsewhere). They can me DM’d to me or posted in this thread. There may be some dependence on certain details of cluster configuration.
Thanks
Dear wtempel and hsnyder,
I will send our SLURM configuration files in a separate direct message to hsnyder.
Meanwhile, the complaints of our users force me to downgrade to v4.5.3. When the issues with the current v4.6.0 have been solved, I will upgrade cyroSPARC again.
Best regards,
Dirk
For troubleshooting job stalls as those described in this topic, we still recommend testing whether the stall is resolved by disabling THP.
It is unclear to us at this time whether the new IO subsystem introduced in v4.6.0 is affected by THP differently than earlier implementations of the IO subsystem.
For stalls observed for GPU-accelerated CryoSPARC jobs on a CryoSPARC instance where
one may try to increase the number of requested CPUs. We received feedback that the increase resolved the stall for jobs submitted with a modified script template.
For example, one might replace a line
#SBATCH --cpus-per-task={{ num_cpu }}
in the current template
with
{% set increased_num_cpu = 8 -%}
#SBATCH --cpus-per-task={{ [1, num_cpu, [increased_num_cpu*num_gpu, increased_num_cpu]|min]|max }}
This recommendation is based on user feedback. Please be aware that we have not yet reproduced stalls that were resolved by an increase of the number of requested CPUs beyond {{ num_cpu }}
. Our recommendation to increase the number of requested CPUs is subject to change, pending user feedback and our own testing. Please report your observations regarding the
--cpus-per-task=
or equivalent parameter in this thread.
Special considerations may apply to GPU-accelerated VMs in the cloud that are used with a cluster workload manager like slurm. A suitable cloud-based VM may have (just) the required number of virtual cores, but custom VM and/or workload manager settings may or may not be required in order for those virtual cores to be “recognized” for the purpose of CPU allocations.
Disabling THP requires a reboot of all GPU servers in our SLURM cluster, and this would affect other software users as well.
Anyway, I had to downgrade cryoSPARC to v4.5.3 in order to restore a working cryoSPARC environment for our more than 30 users.
Best regards,
Dirk
Just out of curiosity/naiveté, why does switching from madvise
to never
with transparent_hugepages
require a reboot in your cluster environment? Are the worker systems PXE booting an immutable environment? Thanks in advance.
A reboot is only required if you want to completely get rid of THP, which is what I would have done (see here and jump to “To disable THP at run time”).
Best regards,
Dirk
If you’re currently set to madvise
, you can add the following line to your cryosparc_master/config.sh
and cryosparc_worker/config.sh
, which may help:
export NUMPY_MADVISE_HUGEPAGE=0
This will tell numpy
not to request hugepages (which it does by default). This will not work if the system wide THP setting is always
. We are considering making this the default in the future, given the number of users who have had hugepage related problems.
This is not true. Many users in this thread reported that disabling THP did not fix it, which lead to the discovery of another category of stalls related to SLURM. Of all the stall reports related to v4.6, most have been resolved for users who disable THP. We have also reproduced the THP related stalls internally, they are very reliably reproducible.
My recommendation is still that the first thing tried by anyone experiencing stalls should be to disable THP, either system wide, or, if that is undesirable and if the system-wide setting is madvise
, via the environment variable I mentioned above.
This is very useful, thanks @hsynder!
@hsynder: Many thanks for this insight and your recommendation! This sounds very helpful! I will give it a try as soon as I will update to v4.6.x, again! (At the moment I want to calm down the situation, first.)
Best regards,
Dirk
Hi everyone,
CryoSPARC v4.6.1, released today, contains a change which we believe will fix the non-transparent-hugepage-related stalls on cluster nodes. We were not able to reproduce the problem ourselves so we cannot be 100% certain, but with the help of forum users we discovered a possible stall scenario and fixed it. We would greatly appreciate it if anyone previously experiencing this issue could update to v4.6.1 and confirm that the problem is resolved.
v4.6.1 also reconfigures Python’s numerical library (numpy) to not request huge pages from the operating system. We have found that this change resolves stalls related to transparent huge pages and it is therefore no longer necessary to turn off THP at the system level (leaving the setting at the default “madvise” should no longer cause problems). In v4.6.1, jobs will also emit a warning if the OS is set to “always” enable THP. If you have already changed your OS configuration to disable THP, it is possible (though not necessary) to revert the OS configuration change after upgrade to v4.6.1.
–Harris
Hi @hsnyder ,
Today I have upgraded from 4.5.3 to 4.6.1.
our cluster nodes has the following configuration.
cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
Job does emit a warning
[CPU: 254.7 MB Avail: 297.94 GB]
Transparent hugepages are enabled. You may encounter stalls or performance problems with CryoSPARC jobs.
This is just a warning and nothing to be worried about as numpy is not requesting huge pages from the operating system ? I dont think I can request them to change this cluster wide, as we may have to live with it. Would be great if you could clarify.
Good question, my previous message probably could have been clearer about this. Numpy by default will request THPs from the OS using the madvise
system call. That’s what the madvise
setting is about. The change we made in 4.6.1 is to prevent numpy from making that request. That change will only have an effect if the system-wide setting is madvise
. If it’s always
, then the kernel will always try to use THPs, whether an application requests it or not, and likewise never
menns the OS will never try to use THPs. There’s nothing we can really do about a system that is set to always
use THP, which is why we issue the warning. That said, some users don’t experience these problems - it possibly depends on Linux kernel version. The warning is just to bring to your attention the fact that CryoSPARC itself can’t do anything about the fact that the system will try to use THP, and if you experience jobs stalling or becoming egregiously slow, turning that system-wide setting off could be indicated.
Harris
Thanks for the clarification @hsnyder
would adding the following to config.sh in worker help ?
export NUMPY_MADVISE_HUGEPAGE=0
No, that’s exactly what we do in v4.6.1. We don’t do it via config.sh, but it’s exactly the same mechanism. It works the way I described previously.
Harris
Today I updated the cryosparc from v4.6.0 to v4.6.1 on my workstation. Now I am doing Local Refinement and NU Refinement and in both cases I got these two warnings which I have never seen before:
Should I worry about these warnings? Thanks!
Hi @donghuachen,
You don’t need to worry about them, but they do indicate potential problems. It seems you have transparent huge pages set to [always]
. This is only a problem if it results in job stalls on your particular system, so it may or may not be something you should change. Also, CryoSPARC disabled io_uring due to lack of kernel support, which suggests you might be using a very old linux distribution, like centos7? I recommend upgrading for many reasons, but this is just a performance thing, not a correctness problem.
Harris
Hi @hsnyder ,
Thanks for your reply.
I just checked my linux version and found the following:
cat /etc/centos-release
CentOS Stream release 8