Cryosparc v4.6.0 2D job never finish

@dirk @maxim Do you submit your CryoSPARC jobs to a cluster? If so, please can you

  1. post the output of the command
    cryosparcm cli "get_scheduler_targets()"
    
  2. indicate the name of the relevant cluster scheduler lane (as configured inside CryoSPARC)

Yes, the users submit jobs to a SLURM queuing system with three queues, 2gpu, 4gpu, 6gpu, with the number of available GPUs per SLUM node in the queue name. The abnormally long jobs were running on the 4gpu and 6gpu queues. The longest 2D classification with v4.6.0 ran on the 4gpu queue for ~120h with less than1 Mio particles.

Here is the output of the get_scheduler_targets command:

$ cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/scratch', 'cache_quota_mb': 3400000, 'cache_reserve_mb': 10240, 'custom_var_names': ['slurmnode'], 'custom_vars': {}, 'desc': None, 'hostname': '4gpu', 'lane': '4gpu', 'name': '4gpu', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/bin/bash -e\n#\n# cryoSPARC script for SLURM submission with sbatch\n#\n# 24-01-2022  Dirk Kostrewa  Original file\n# 23-05-2023  Dirk Kostrewa  Added cluster submission script variable "slurmnode"\n# 01-03-2024  Dirk Kostrewa  cgroups: no CUDA_VISIBLE_DEVICES, no --gres-flags=enforce-binding\n# 06-03-2024  Dirk Kostrewa  Double memory allocation in "--mem="\n\n#SBATCH --partition=4gpu\n#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --mem={{ (ram_gb*2)|int }}G\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --mail-type=NONE\n#SBATCH --mail-user={{ cryosparc_username }}\n#SBATCH --nodelist={{ slurmnode }}\n\n{{ run_cmd }}\n', 'send_cmd_tpl': '{{ command }}', 'title': 'slurm 4gpu', 'tpl_vars': ['job_log_path_abs', 'cluster_job_id', 'project_uid', 'ram_gb', 'cryosparc_username', 'num_gpu', 'num_cpu', 'run_cmd', 'command', 'slurmnode', 'job_uid'], 'type': 'cluster', 'worker_bin_path': '/home/cryosparc/cryosparc/cryosparc_worker/bin/cryosparcw'},
 {'cache_path': '/scratch', 'cache_quota_mb': 3400000, 'cache_reserve_mb': 10240, 'custom_var_names': ['slurmnode'], 'custom_vars': {}, 'desc': None, 'hostname': '6gpu', 'lane': '6gpu', 'name': '6gpu', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/bin/bash -e\n#\n# cryoSPARC script for SLURM submission with sbatch\n#\n# 24-01-2022  Dirk Kostrewa  Original file\n# 23-05-2023  Dirk Kostrewa  Added cluster submission script variable "slurmnode"\n# 01-03-2024  Dirk Kostrewa  cgroups: no CUDA_VISIBLE_DEVICES, no --gres-flags=enforce-binding\n# 06-03-2024  Dirk Kostrewa  Double memory allocation in "--mem="\n\n#SBATCH --partition=6gpu\n#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --mem={{ (ram_gb*2)|int }}G\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --mail-type=NONE\n#SBATCH --mail-user={{ cryosparc_username }}\n#SBATCH --nodelist={{ slurmnode }}\n\n{{ run_cmd }}\n', 'send_cmd_tpl': '{{ command }}', 'title': 'slurm 6gpu', 'tpl_vars': ['job_log_path_abs', 'cluster_job_id', 'project_uid', 'ram_gb', 'cryosparc_username', 'num_gpu', 'num_cpu', 'run_cmd', 'command', 'slurmnode', 'job_uid'], 'type': 'cluster', 'worker_bin_path': '/home/cryosparc/cryosparc/cryosparc_worker/bin/cryosparcw'},
 {'cache_path': '/scratch', 'cache_quota_mb': 3400000, 'cache_reserve_mb': 10240, 'custom_var_names': ['slurmnode'], 'custom_vars': {}, 'desc': None, 'hostname': '2gpu', 'lane': '2gpu', 'name': '2gpu', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/bin/bash -e\n#\n# cryoSPARC script for SLURM submission with sbatch\n#\n# 24-01-2022  Dirk Kostrewa  Original file\n# 23-05-2023  Dirk Kostrewa  Added cluster submission script variable "slurmnode"\n# 01-03-2024  Dirk Kostrewa  cgroups: no CUDA_VISIBLE_DEVICES, no --gres-flags=enforce-binding\n# 06-03-2024  Dirk Kostrewa  Double memory allocation in "--mem="\n\n#SBATCH --partition=2gpu\n#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --mem={{ (ram_gb*2)|int }}G\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --mail-type=NONE\n#SBATCH --mail-user={{ cryosparc_username }}\n#SBATCH --nodelist={{ slurmnode }}\n\n{{ run_cmd }}\n', 'send_cmd_tpl': '{{ command }}', 'title': 'slurm 2gpu', 'tpl_vars': ['job_log_path_abs', 'cluster_job_id', 'project_uid', 'ram_gb', 'cryosparc_username', 'num_gpu', 'num_cpu', 'run_cmd', 'command', 'slurmnode', 'job_uid'], 'type': 'cluster', 'worker_bin_path': '/home/cryosparc/cryosparc/cryosparc_worker/bin/cryosparcw'}]

@dirk Thanks for sharing the scheduler target information. Please can you also let us know

  1. the output of the command
    cat /sys/kernel/mm/transparent_hugepage/enabled 
    
    on your cluster worker nodes
  2. whether CPU resources available to jobs are constrained according to the
    #SBATCH --cpus-per-task=
    
    parameter

We are still looking into this. One other thing which anyone encountering this issue could do to help us is to share your /etc/slurm/slurm.conf and /etc/slurm/cgroup.conf files with us (or the corresponding ones, if located elsewhere). They can me DM’d to me or posted in this thread. There may be some dependence on certain details of cluster configuration.

Thanks

Dear wtempel and hsnyder,

  1. The output of the command “cat /sys/kernel/mm/transparent_hugepage/enabled” is: [always] madvise never
    I do not plan to change this, since cryoSPARC worked well with this parameter in v4.5.3 and no user who reported that this parameter was changed had any success with it.
  2. I use “#SBATCH --cpus-per-task={{ num_cpu }}” in our cluster_script.sh and “ConstrainCores=yes” in cgroup.conf - again, this worked well with v4.5.3.

I will send our SLURM configuration files in a separate direct message to hsnyder.

Meanwhile, the complaints of our users force me to downgrade to v4.5.3. When the issues with the current v4.6.0 have been solved, I will upgrade cyroSPARC again.

Best regards,

Dirk

For troubleshooting job stalls as those described in this topic, we still recommend testing whether the stall is resolved by disabling THP.
It is unclear to us at this time whether the new IO subsystem introduced in v4.6.0 is affected by THP differently than earlier implementations of the IO subsystem.
For stalls observed for GPU-accelerated CryoSPARC jobs on a CryoSPARC instance where

  • the job was queued to a workload manager like slurm,
  • CPU resources available to the job are actually restricted to the resources requested , and,
  • the stall is not resolved by disabling THP:

one may try to increase the number of requested CPUs. We received feedback that the increase resolved the stall for jobs submitted with a modified script template.
For example, one might replace a line

#SBATCH --cpus-per-task={{ num_cpu }}

in the current template
with

{% set increased_num_cpu = 8 -%}
#SBATCH --cpus-per-task={{ [1, num_cpu, [increased_num_cpu*num_gpu, increased_num_cpu]|min]|max }}

This recommendation is based on user feedback. Please be aware that we have not yet reproduced stalls that were resolved by an increase of the number of requested CPUs beyond {{ num_cpu }}. Our recommendation to increase the number of requested CPUs is subject to change, pending user feedback and our own testing. Please report your observations regarding the
--cpus-per-task= or equivalent parameter in this thread.
Special considerations may apply to GPU-accelerated VMs in the cloud that are used with a cluster workload manager like slurm. A suitable cloud-based VM may have (just) the required number of virtual cores, but custom VM and/or workload manager settings may or may not be required in order for those virtual cores to be “recognized” for the purpose of CPU allocations.

Disabling THP requires a reboot of all GPU servers in our SLURM cluster, and this would affect other software users as well.
Anyway, I had to downgrade cryoSPARC to v4.5.3 in order to restore a working cryoSPARC environment for our more than 30 users.

Best regards,
Dirk

Just out of curiosity/naiveté, why does switching from madvise to never with transparent_hugepages require a reboot in your cluster environment? Are the worker systems PXE booting an immutable environment? Thanks in advance. :slight_smile:

A reboot is only required if you want to completely get rid of THP, which is what I would have done (see here and jump to “To disable THP at run time”).

Best regards,
Dirk

If you’re currently set to madvise, you can add the following line to your cryosparc_master/config.sh and cryosparc_worker/config.sh, which may help:

export NUMPY_MADVISE_HUGEPAGE=0

This will tell numpy not to request hugepages (which it does by default). This will not work if the system wide THP setting is always. We are considering making this the default in the future, given the number of users who have had hugepage related problems.

This is not true. Many users in this thread reported that disabling THP did not fix it, which lead to the discovery of another category of stalls related to SLURM. Of all the stall reports related to v4.6, most have been resolved for users who disable THP. We have also reproduced the THP related stalls internally, they are very reliably reproducible.

My recommendation is still that the first thing tried by anyone experiencing stalls should be to disable THP, either system wide, or, if that is undesirable and if the system-wide setting is madvise, via the environment variable I mentioned above.

3 Likes

This is very useful, thanks @hsynder!

1 Like

@hsynder: Many thanks for this insight and your recommendation! This sounds very helpful! I will give it a try as soon as I will update to v4.6.x, again! (At the moment I want to calm down the situation, first.)

Best regards,
Dirk

Hi everyone,

CryoSPARC v4.6.1, released today, contains a change which we believe will fix the non-transparent-hugepage-related stalls on cluster nodes. We were not able to reproduce the problem ourselves so we cannot be 100% certain, but with the help of forum users we discovered a possible stall scenario and fixed it. We would greatly appreciate it if anyone previously experiencing this issue could update to v4.6.1 and confirm that the problem is resolved.

v4.6.1 also reconfigures Python’s numerical library (numpy) to not request huge pages from the operating system. We have found that this change resolves stalls related to transparent huge pages and it is therefore no longer necessary to turn off THP at the system level (leaving the setting at the default “madvise” should no longer cause problems). In v4.6.1, jobs will also emit a warning if the OS is set to “always” enable THP. If you have already changed your OS configuration to disable THP, it is possible (though not necessary) to revert the OS configuration change after upgrade to v4.6.1.

–Harris

3 Likes

Hi @hsnyder ,

Today I have upgraded from 4.5.3 to 4.6.1.

our cluster nodes has the following configuration.

cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

Job does emit a warning

[CPU:  254.7 MB  Avail: 297.94 GB]
Transparent hugepages are enabled. You may encounter stalls or performance problems with CryoSPARC jobs.

This is just a warning and nothing to be worried about as numpy is not requesting huge pages from the operating system ? I dont think I can request them to change this cluster wide, as we may have to live with it. Would be great if you could clarify.

@Rajan,

Good question, my previous message probably could have been clearer about this. Numpy by default will request THPs from the OS using the madvise system call. That’s what the madvise setting is about. The change we made in 4.6.1 is to prevent numpy from making that request. That change will only have an effect if the system-wide setting is madvise. If it’s always, then the kernel will always try to use THPs, whether an application requests it or not, and likewise never menns the OS will never try to use THPs. There’s nothing we can really do about a system that is set to always use THP, which is why we issue the warning. That said, some users don’t experience these problems - it possibly depends on Linux kernel version. The warning is just to bring to your attention the fact that CryoSPARC itself can’t do anything about the fact that the system will try to use THP, and if you experience jobs stalling or becoming egregiously slow, turning that system-wide setting off could be indicated.

Harris

Thanks for the clarification @hsnyder

would adding the following to config.sh in worker help ?

export NUMPY_MADVISE_HUGEPAGE=0

@Rajan,

No, that’s exactly what we do in v4.6.1. We don’t do it via config.sh, but it’s exactly the same mechanism. It works the way I described previously.

Harris

1 Like

@hsnyder

Today I updated the cryosparc from v4.6.0 to v4.6.1 on my workstation. Now I am doing Local Refinement and NU Refinement and in both cases I got these two warnings which I have never seen before:

  1. Transparent hugepages are enabled. You may encounter stalls or performance problems with CryoSPARC jobs
  2. WARNING: io_uring support disabled (not supported by kernel), I/O performance may degrade

Should I worry about these warnings? Thanks!

Hi @donghuachen,

You don’t need to worry about them, but they do indicate potential problems. It seems you have transparent huge pages set to [always]. This is only a problem if it results in job stalls on your particular system, so it may or may not be something you should change. Also, CryoSPARC disabled io_uring due to lack of kernel support, which suggests you might be using a very old linux distribution, like centos7? I recommend upgrading for many reasons, but this is just a performance thing, not a correctness problem.

Harris

Hi @hsnyder ,

Thanks for your reply.
I just checked my linux version and found the following:
cat /etc/centos-release
CentOS Stream release 8