Worker update to 4.7.0 failed

Hello,

I am encountering the following error:
./bin/cryosparcw update

Updating… checking versions

Current version v4.6.2 - New version v4.7.0

=============================

Updating worker…

=============================

Deleting old files…

rm: cannot remove ‘cryosparc_compute/blobio/.nfs00000005805e122600000001’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/engine/.nfs0000000702ed706e00000002’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/engine/.nfs0000000702ed706f00000003’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/engine/.nfs0000000702ed707000000004’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/engine/.nfs0000000702ed707100000005’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/engine/.nfs0000000702ed707200000006’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/engine/.nfs0000000702ed707400000007’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/engine/.nfs0000000702ed707500000008’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/engine/.nfs0000000702ed707600000009’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/gpu/.nfs0000000800527a670000000a’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/gpu/.nfs0000000800527a6c0000000b’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/ioengine/.nfs0000000901159e420000000c’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/jobs/hetero_refine/.nfs0000000a81d2ec390000000d’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/jobs/utilities/.nfs0000000300e8ea040000000e’: Device or resource busy

rm: cannot remove ‘cryosparc_compute/.nfs00000005008874e70000000f’: Device or resource busy

rm: cannot remove ‘cryosparc_tools/cryosparc/.nfs0000001500e2d62400000010’: Device or resource busy

Please advise

@Yehuda Please can you post the output of the command

fuser -v /path/to/cryosparc_worker/cryosparc_compute/blobio/.nfs00000005805e122600000001

The output is empty. To clarify: Cryosparc is running on a group of nodes in a cluster. The login node is master. GPU nodes are workers. Previous updates were done as follows: the master node was updated first. Then ssh cryosparc_user@hpc(whatever, they are the same) and cruosparcw was updated. This time I am getting the error message. Thank you!

@Yehuda May I ask:

  1. Has it been ensured that no CryoSPARC job are running using this particular cryosparc_worker/ installation?
  2. What is the output of the command
    cryosparcm cli "get_scheduler_targets()"
    
  3. Are the workers sharing a single cryosparc_worker/ installation?
  4. What is the output of the command
    cat /path/to/cryosparc_worker/version
    
  1. No cryosparc jobs are running

  2. cryosparcm cli “get_scheduler_targets()”
    [{‘cache_path’: ‘/scratch’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: , ‘custom_vars’: {}, ‘desc’: None, ‘hostname’: ‘cluster’, ‘lane’: ‘cluster’, ‘name’: ‘cluster’, ‘qdel_cmd_tpl’: ‘scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: “sinfo --format=‘%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E’”, ‘qstat_cmd_tpl’: ‘squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘sbatch {{ script_path_abs }}’, ‘script_tpl’: ‘#!/bin/bash\n#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --partition=hpcs\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --mem={{ (ram_gb*6500)|int }}M\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --gres-flags=enforce-binding\nsrun {{ run_cmd }}\n\n\n\n\n\n\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘cluster’, ‘tpl_vars’: [‘cluster_job_id’, ‘project_uid’, ‘num_cpu’, ‘run_cmd’, ‘ram_gb’, ‘job_uid’, ‘job_log_path_abs’, ‘num_gpu’, ‘command’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/home/cryosparc_user/software/cryosparc3/cryosparc_worker/bin/cryosparcw’}]

  3. Yes

  4. cat version
    v4.6.2

Unexpectedly, updating cryosparcw on another GPU node seems to have worked.

1 Like