Jobs runs very slow

Hi all,

We are facing a problem that crysparc runs extremely slow for almost all job types. It took twice or even longer to process (tested on the previous jobs). Shutting down the system to cool down the GPU didn`t help.
What could be the possible reason?
Our cryosparc version is v4.4.1

Welcome to the forum @Heng .

Please can you run this command for a few pairs of jobs where you observed the slow performance:

cli "get_job('P44', 'J465',  'instance_information', 'completed_at', 'started_at', 'job_type', 'params_spec', 'version', 'uid', 'project_uid')"

where you

  1. replace the project and job UIDs as appropriate.
  2. arrange the outputs in pairs of comparable jobs
  • PROJECT A

PREVIOUS RUN
{‘_id’: ‘65150aaaa71a5a8389453e15’, ‘completed_at’: ‘Thu, 28 Sep 2023 05:26:45 GMT’, ‘instance_information’: {‘CUDA_version’: ‘11.6.0’, ‘available_memory’: ‘320.36GB’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 25438126080, ‘name’: ‘NVIDIA GeForce RTX 3090’}], ‘max_cpu_freq’: 3300.0, ‘physical_cores’: 24, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘yuchi-SYS-740GP-TNRT’, ‘platform_release’: ‘5.4.0-163-generic’, ‘platform_version’: ‘#180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023’, ‘total_memory’: ‘377.26GB’, ‘used_memory’: ‘54.29GB’}, ‘job_type’: ‘homo_refine_new’, ‘params_spec’: {‘compute_use_ssd’: {‘value’: False}}, ‘project_uid’: ‘P22’, ‘started_at’: ‘Thu, 28 Sep 2023 05:14:57 GMT’, ‘uid’: ‘J349’, ‘version’: ‘v4.2.1’}

NOW
{‘_id’: ‘65af7c1d3c38292cfc856690’, ‘completed_at’: ‘Tue, 23 Jan 2024 09:05:15 GMT’, ‘instance_information’: {‘CUDA_version’: ‘11.8’, ‘available_memory’: ‘226.82GB’, ‘cpu_model’: ‘Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz’, ‘driver_version’: ‘12.0’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 25438126080, ‘name’: ‘NVIDIA GeForce RTX 3090’}], ‘ofd_hard_limit’: 1048576, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 24, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘yuchi-SYS-740GP-TNRT’, ‘platform_release’: ‘5.4.0-169-generic’, ‘platform_version’: ‘#187-Ubuntu SMP Thu Nov 23 14:52:28 UTC 2023’, ‘total_memory’: ‘251.51GB’, ‘used_memory’: ‘23.08GB’}, ‘job_type’: ‘homo_refine_new’, ‘params_spec’: {‘compute_use_ssd’: {‘value’: False}}, ‘project_uid’: ‘P22’, ‘started_at’: ‘Tue, 23 Jan 2024 08:43:33 GMT’, ‘uid’: ‘J387’, ‘version’: ‘v4.4.1’}

  • PROJECT B

PREVIOUS RUN
{‘_id’: ‘643ba9dd3f3665811b3e8eac’, ‘completed_at’: ‘Sun, 16 Apr 2023 09:43:17 GMT’, ‘instance_information’: {‘CUDA_version’: ‘11.6.0’, ‘available_memory’: ‘297.41GB’, ‘gpu_info’: [{‘id’: 1, ‘mem’: 25447170048, ‘name’: ‘NVIDIA GeForce RTX 3090’}], ‘max_cpu_freq’: 3300.0, ‘physical_cores’: 24, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘yuchi-SYS-740GP-TNRT’, ‘platform_release’: ‘5.4.0-146-generic’, ‘platform_version’: ‘#163-Ubuntu SMP Fri Mar 17 18:26:02 UTC 2023’, ‘total_memory’: ‘377.27GB’, ‘used_memory’: ‘76.92GB’}, ‘job_type’: ‘hetero_refine’, ‘params_spec’: {‘compute_use_ssd’: {‘value’: False}}, ‘project_uid’: ‘P23’, ‘started_at’: ‘Sun, 16 Apr 2023 09:25:50 GMT’, ‘uid’: ‘J72’, ‘version’: ‘v4.2.1’}

NOW
{‘_id’: ‘65af45b1a9b1ffb9d7a1e7af’, ‘completed_at’: ‘Tue, 23 Jan 2024 09:31:31 GMT’, ‘instance_information’: {‘CUDA_version’: ‘11.8’, ‘available_memory’: ‘245.76GB’, ‘cpu_model’: ‘Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz’, ‘driver_version’: ‘12.0’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 25438126080, ‘name’: ‘NVIDIA GeForce RTX 3090’}], ‘ofd_hard_limit’: 1048576, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 24, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘yuchi-SYS-740GP-TNRT’, ‘platform_release’: ‘5.4.0-169-generic’, ‘platform_version’: ‘#187-Ubuntu SMP Thu Nov 23 14:52:28 UTC 2023’, ‘total_memory’: ‘251.51GB’, ‘used_memory’: ‘4.34GB’}, ‘job_type’: ‘hetero_refine’, ‘params_spec’: {‘compute_use_ssd’: {‘value’: False}}, ‘project_uid’: ‘P23’, ‘started_at’: ‘Tue, 23 Jan 2024 07:53:12 GMT’, ‘uid’: ‘J148’, ‘version’: ‘v4.4.1’}

@Heng Thanks for posting this information.
Please can you test whether run times are affected by modifying the transparent_huge setting:

  1. check the current setting
    cat /sys/kernel/mm/transparent_hugepage/enabled
    
  2. if the setting is not never:
    1. disable transparent_hugepage (as root, details)

      echo never > /sys/kernel/mm/transparent_hugepage/enabled
      

      (this setting may revert to default after a reboot)

    2. run clones of jobs P22.J387, P23.J148

    3. post timings output by appropriately modified commands like