Hi,
we have a shared cache which is accessed by multiple nodes (7) Master Worker installation.
We use the improved ssd cache. I also did a full cache reset according to:
For ~20% of the started jobs we get a File not Found error see below.
The error seems to occur with a higher frequency if the load is higher (more jobs).
If you resubmit the same job it usually starts running. Sometimes multiple attempts are needed.
thanks for helping
Florian
Worker Configuration
cat config.sh
export CRYOSPARC_LICENSE_ID=“xxxxxxxxx”
export CRYOSPARC_USE_GPU=true
export CRYOSPARC_IMPROVED_SSD_CACHE=true
export CRYOSPARC_CACHE_NUM_THREADS=6
Error Message:
[CPU: 186.4 MB Avail: 665.48 GB]
Master running v4.4.1, worker running v4.4.1
[CPU: 186.6 MB Avail: 665.48 GB]
Working in directory: /fs/pool/pool-cryosparc/users/user23/xxxxxxxx/J485
[CPU: 186.6
MB Avail: 665.48 GB]
Running on lane h9002-chkGPU
[CPU: 186.6
MB Avail: 665.48 GB]
Resources allocated:
[CPU: 186.6
MB Avail: 665.48 GB]
Worker: hpcl9002
[CPU: 186.6
MB Avail: 665.48 GB]
CPU : [8, 9]
[CPU: 186.6
MB Avail: 665.48 GB]
GPU : [2]
[CPU: 186.6
MB Avail: 665.48 GB]
RAM : [2, 6, 7]
[CPU: 186.6
MB Avail: 665.48 GB]
SSD : True
[CPU: 186.6
MB Avail: 665.48 GB]
———————————————————————————————
[CPU: 186.6
MB Avail: 665.48 GB]
Importing job module for job type class_2D_new…
[CPU: 218.3
MB Avail: 665.21 GB]
Job ready to run
[CPU: 218.3
MB Avail: 665.21 GB]
[CPU: 268.8
MB Avail: 666.04 GB]
Using random seed of 368515461
[CPU: 269.1
MB Avail: 666.04 GB]
Loading a ParticleStack with 58638 items…
[CPU: 269.1 MB Avail: 666.04 GB]
────────────────────────────────────────────────────────────── SSD cache ACTIVE at /fs/pool/pool-briggs-scratch/cryoSparc/instance_brcryosparc:xxxxx
(10 GB reserve) (52 TB quota) Checking files on SSD …
[CPU: 1.28 GB Avail: 664.07 GB]
Traceback (most recent call last): File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_master.cryosparc_compute.run.main File “cryosparc_master/cryosparc_compute/jobs/class2D/newrun.py”,
line 73, in cryosparc_master.cryosparc_compute.jobs.class2D.newrun.run_class_2D File “/fs/gpfs41/lv07/fileset02/home/b_baumei/cryosparcuser/csV4.4.1/cryosparc_worker_hpcl900x/cryosparc_compute/particles.py”, line 120, in read_blobs u_blob_paths = cache_run(u_rel_paths)
File “/fs/gpfs41/lv07/fileset02/home/b_baumei/cryosparcuser/csV4.4.1/cryosparc_worker_hpcl900x/cryosparc_compute/jobs/cache_v2.py”, line 796, in run return run_with_executor(rel_sources, executor) File “/fs/gpfs41/lv07/fileset02/home/b_baumei/cryosparcuser/csV4.4.1/cryosparc_worker_hpcl900x/cryosparc_compute/jobs/cache_v2.py”,
line 828, in run_with_executor state = drive.allocate(sources, active_run_ids=info[“active_run_ids”]) File “/fs/gpfs41/lv07/fileset02/home/b_baumei/cryosparcuser/csV4.4.1/cryosparc_worker_hpcl900x/cryosparc_compute/jobs/cache_v2.py”, line 612, in allocate self.create_run_links(sources)
File “/fs/gpfs41/lv07/fileset02/home/b_baumei/cryosparcuser/csV4.4.1/cryosparc_worker_hpcl900x/cryosparc_compute/jobs/cache_v2.py”, line 511, in create_run_links link.symlink_to(f"…/…/{STORE_DIR}/{source.key_prefix}/{source.key}") File “/fs/gpfs41/lv07/fileset02/home/b_baumei/cryosparcuser/csV4.4.1/cryosparc_worker_hpcl900x/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/pathlib.py”,
line 1384, in symlink_to self._accessor.symlink(target, self, target_is_directory) File “/fs/gpfs41/lv07/fileset02/home/b_baumei/cryosparcuser/csV4.4.1/cryosparc_worker_hpcl900x/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/pathlib.py”, line 446, in
symlink return os.symlink(a, b) FileNotFoundError: [Errno 2] No such file or directory: ‘…/…/store-v2/6f/6f0fede7b9d211ba2b2492bac7a3680ddf2f2090’ → ‘/fs/pool/pool-briggs-scratch/cryoSparc/instance_brcryosparc:38001/links/P182-J485-1712812342/6f0fede7b9d211ba2b2492bac7a3680ddf2f2090.mrc’