Failed Exposures from mounted drive in live session

schaefer-jh · September 11, 2024, 10:10pm

After updating to v4.6.0, the import of exposures in cryoSPARC lives from a mounted drive (PATH1) fails with the following log:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/jobs/rtp_workers/run.py”, line 269, in cryosparc_master.cryosparc_compute.jobs.rtp_workers.run.symlink_path
FileExistsError: [Errno 17] File exists: ‘PATH1/FoilHole_10105489_Data_10086807_44_20240910_130434_EER.eer’ → ‘PATH2/S1/import_movies/FoilHole_10105489_Data_10086807_44_20240910_130434_EER.eer’

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/jobs/rtp_workers/run.py”, line 381, in cryosparc_master.cryosparc_compute.jobs.rtp_workers.run.rtp_worker
File “cryosparc_master/cryosparc_compute/jobs/rtp_workers/run.py”, line 450, in cryosparc_master.cryosparc_compute.jobs.rtp_workers.run.process_movie
File “cryosparc_master/cryosparc_compute/jobs/rtp_workers/run.py”, line 479, in cryosparc_master.cryosparc_compute.jobs.rtp_workers.run.do_check
File “cryosparc_master/cryosparc_compute/jobs/rtp_workers/run.py”, line 272, in cryosparc_master.cryosparc_compute.jobs.rtp_workers.run.symlink_path
Exception: Failed to create symbolic link /PATH2/S1/import_movies/FoilHole_10105489_Data_10086807_44_20240910_130434_EER.eer

Anybody with the same issue?

wtempel · September 12, 2024, 1:28pm

Welcome to the forum @schaefer-jh . Please can you

post the outputs of these commands

stat -f /PATH2/S1/import_movies/
ls /PATH2/S1/import_movies/*eer | tail -n10

let us know if you successfully performed CryoSPARC Live processing on this instance prior to the update and using the same project directory storage
post the CryoSPARC version from which you updated
post a history of disruptions/crashes of the CryoSPARC instance, if any
output oft the command
cryosparcm cli "get_scheduler_targets()"
and a screenshot of the Configuration panel of the CryoSPARC Live session showing the Preprocessing Lane and Number of Preprocessing GPU Workers

schaefer-jh · September 12, 2024, 4:32pm

Thanks @wtempel for helping:

Command output:
$ stat -f /nPATH2/S1/import_movies/
File: “/PATH2/S1/import_movies/”
ID: 0 Namelen: 255 Type: nfs
Block size: 65536 Fundamental block size: 65536
Blocks: Total: 5851873792 Free: 985420500 Available: 985420500
Inodes: Total: 14980802112 Free: 14955078965

ls /PATH2/S1/import_movies/*eer | tail -n10
/PATH2/S1/import_movies/FoilHole_10105481_Data_10086807_40_20240910_125351_EER.eer
/PATH2/S1/import_movies/FoilHole_10105482_Data_10086807_55_20240910_131334_EER.eer
/PATH2/S1/import_movies/FoilHole_10105483_Data_10086807_48_20240910_131339_EER.eer
/PATH2/S1/import_movies/FoilHole_10105484_Data_10086807_51_20240910_131344_EER.eer
/PATH2/S1/import_movies/FoilHole_10105485_Data_10086807_41_20240910_130414_EER.eer
/PATH2/S1/import_movies/FoilHole_10105486_Data_10086807_31_20240910_130419_EER.eer
/PATH2/S1/import_movies/FoilHole_10105487_Data_10086807_26_20240910_130424_EER.eer
/PATH2/S1/import_movies/FoilHole_10105488_Data_10086807_32_20240910_130429_EER.eer
/PATH2/S1/import_movies/FoilHole_10105489_Data_10086807_44_20240910_130434_EER.eer
/PATH2/S1/import_movies/FoilHole_10105490_Data_10086807_42_20240910_125826_EER.eer

yes, live processing worked before the update using the same configuration
updated from v.4.5.3 → v.4.6.0
no similar crashes

 $ cryosparcm cli 'get_scheduler_targets()'
 [{'cache_path': '/scratch', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'custom_var_names': [], 'custom_vars': {}, 'desc': None, 'hostname': 'cryosparc', 'lane': 'cryosparc', 'name': 'cryosparc', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': "sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'", 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch  {{ script_path_abs }}', 'script_tpl': '#!/bin/bash\n#SBATCH --job-name {{ project_uid }}_{{ job_uid }}\n#SBATCH --partition=cryosparc\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --mem={{ (ram_gb*1000)|int }}M\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --gres-flags=enforce-binding\n##SBATCH --exclusive\n\nsrun {{ run_cmd }}\n\n\n', 'send_cmd_tpl': '{{ command }}', 'title': 'cryosparc', 'tpl_vars': ['command', 'run_cmd', 'num_gpu', 'project_uid', 'ram_gb', 'job_uid', 'num_cpu', 'job_log_path_abs', 'cluster_job_id'], 'type': 'cluster', 'worker_bin_path': '/home_local/hpc/cryosparc2/cryosparc2_worker/bin/cryosparcw'}, {'cache_path': '/scratch', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'custom_var_names': [], 'custom_vars': {}, 'desc': None, 'hostname': 'cryosparc1', 'lane': 'cryosparc1', 'name': 'cryosparc1', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': "sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'", 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch  {{ script_path_abs }}', 'script_tpl': '#!/bin/bash\n#SBATCH --job-name {{ project_uid }}_{{ job_uid }}\n#SBATCH --partition=cryosparc1\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --mem={{ (ram_gb*1000)|int }}M\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --gres-flags=enforce-binding\n##SBATCH --exclusive\n\nsrun {{ run_cmd }}\n\n\n\n', 'send_cmd_tpl': '{{ command }}', 'title': 'cryosparc1', 'tpl_vars': ['command', 'run_cmd', 'num_gpu', 'project_uid', 'ram_gb', 'job_uid', 'num_cpu', 'job_log_path_abs', 'cluster_job_id'], 'type': 'cluster', 'worker_bin_path': '/home_local/hpc/cryosparc2/cryosparc2_worker/bin/cryosparcw'}, {'cache_path': '/scratch', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'custom_var_names': [], 'custom_vars': {}, 'desc': None, 'hostname': 'cryosparc2', 'lane': 'cryosparc2', 'name': 'cryosparc2', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': "sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'", 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch  {{ script_path_abs }}', 'script_tpl': '#!/bin/bash\n#SBATCH --job-name {{ project_uid }}_{{ job_uid }}\n#SBATCH --partition=cryosparc2\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --mem={{ (ram_gb*1000)|int }}M\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --gres-flags=enforce-binding\n##SBATCH --exclusive\n\nsrun {{ run_cmd }}\n\n\n\n', 'send_cmd_tpl': '{{ command }}', 'title': 'cryosparc2', 'tpl_vars': ['command', 'run_cmd', 'num_gpu', 'project_uid', 'ram_gb', 'job_uid', 'num_cpu', 'job_log_path_abs', 'cluster_job_id'], 'type': 'cluster', 'worker_bin_path': '/home_local/hpc/cryosparc2/cryosparc2_worker/bin/cryosparcw'}, {'cache_path': '/scratch', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'custom_var_names': [], 'custom_vars': {}, 'desc': None, 'hostname': 'cryosparc3', 'lane': 'cryosparc3', 'name': 'cryosparc3', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': "sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'", 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch  {{ script_path_abs }}', 'script_tpl': '#!/bin/bash\n#SBATCH --job-name {{ project_uid }}_{{ job_uid }}\n#SBATCH --partition=cryosparc3\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --mem={{ (ram_gb*1000)|int }}M\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --gres-flags=enforce-binding\n##SBATCH --exclusive\n\nsrun {{ run_cmd }}\n\n\n\n', 'send_cmd_tpl': '{{ command }}', 'title': 'cryosparc3', 'tpl_vars': ['command', 'run_cmd', 'num_gpu', 'project_uid', 'ram_gb', 'job_uid', 'num_cpu', 'job_log_path_abs', 'cluster_job_id'], 'type': 'cluster', 'worker_bin_path': '/home_local/hpc/cryosparc2/cryosparc2_worker/bin/cryosparcw'}, {'cache_path': '/scratch', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'custom_var_names': [], 'desc': None, 'hostname': 'cryosparc4', 'lane': 'cryosparc4', 'name': 'cryosparc4', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': "sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'", 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qsub_cmd_tpl': 'sbatch  {{ script_path_abs }}', 'script_tpl': '#!/bin/bash\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --partition=cryosparc4\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --mem={{ (ram_gb*1000)|int }}M\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --gres-flags=enforce-binding\n##SBATCH --exclusive\n\nsrun {{ run_cmd }}\n\n\n', 'send_cmd_tpl': '{{ command }}', 'title': 'cryosparc4', 'tpl_vars': ['command', 'run_cmd', 'num_gpu', 'project_uid', 'ram_gb', 'job_uid', 'num_cpu', 'job_log_path_abs', 'cluster_job_id'], 'type': 'cluster', 'worker_bin_path': '/home_local/hpc/cryosparc2/cryosparc2_worker/bin/cryosparcw'}]

Hope this helps.

wtempel · September 12, 2024, 10:43pm

Thanks @schaefer-jh .
I wonder whether the problem would persist if the last (non-empty) line of the cluster lane’s script template were modified from currently

to simply
{{ run_cmd }}
You may want to create a test lane with that change by specifying a unique "name": inside the cluster_info.json file that is used with the
cryosparcm cluster connect command (guide). As a starting point for your edits, you may write out the current configuration by running the command
cryosparcm cluster dump name-of-your-existing-lane (guide).

schaefer-jh · September 13, 2024, 12:20am

The issue was in a faulty fiber optic connection. The problem has been resolved. Thanks for your support @wtempel.

wtempel · September 13, 2024, 2:34pm