A user has experienced a problem with SSD particle caching. The cache is an SSD array using a beegfs filesystem and it looks like they were running two jobs attempting to cache particles at the same time. One of the jobs has failed, possibly due to the cache lock? I attache the error messages below - please could you advise on what configuration would best avoid this problem. We are using cryosparc 4.6.0 with the settings:
export CRYOSPARC_CACHE_LOCK_STRATEGY=master
export CRYOSPARC_SSD_CACHE_LIFETIME_DAYS=7
SSD cache ACTIVE at /mnt/beegfs/fast_cache/sauer/instance_cryosparc.cosmic:30247 (8 TB reserve) (8 TB quota)
┌─────────────────────┬───────────────────────┐
│ Cache usage │ Amount │
├─────────────────────┼───────────────────────┤
│ Total / Usable │ 2.28 PiB / 7.28 TiB │
│ Used / Free │ 902.70 GiB / 6.39 TiB │
│ Hits / Misses │ 615.69 GiB / 0.00 B │
│ Acquired / Required │ 665.11 GiB / 0.00 B │
└─────────────────────┴───────────────────────┘
Progress: [▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇----] 16874/18236 (93%)
Elapsed: 0h 47m 38s
Active jobs: P9-J76, P9-J77
1362 pending file(s) (49.43 GiB) locked by other jobs (waiting for past 0h 19m 15s)
[CPU: 363.6 MB]
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 116, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/hetero_refine/run.py”, line 91, in cryosparc_master.cryosparc_compute.jobs.hetero_refine.run.run_hetero_refine
File “/mnt/beegfs/software/structural_biology/release/cryosparc/sauer/cryosparc/cryosparc_worker/cryosparc_compute/particles.py”, line 120, in read_blobs
u_blob_paths = cache_run(u_rel_paths)
File “/mnt/beegfs/software/structural_biology/release/cryosparc/sauer/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache_v2.py”, line 821, in run
return run_with_executor(rel_sources, executor)
File “/mnt/beegfs/software/structural_biology/release/cryosparc/sauer/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache_v2.py”, line 859, in run_with_executor
state = drive.allocate(sources, active_run_ids=info[“active_run_ids”])
File “/mnt/beegfs/software/structural_biology/release/cryosparc/sauer/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache_v2.py”, line 560, in allocate
self.refresh()
File “/mnt/beegfs/software/structural_biology/release/cryosparc/sauer/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache_v2.py”, line 478, in refresh
cached = CacheFile(full_key)
File “/mnt/beegfs/software/structural_biology/release/cryosparc/sauer/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache_v2.py”, line 226, in init
stat = path.stat()
File “/mnt/beegfs/software/structural_biology/release/cryosparc/sauer/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/pathlib.py”, line 1097, in stat
return self._accessor.stat(self, follow_symlinks=follow_symlinks)
FileNotFoundError: [Errno 2] No such file or directory: ‘/mnt/beegfs/fast_cache/sauer/instance_cryosparc.cosmic:30247/store-v2/31/31dc29b42e1fcb1638dbabf554403e186d495051:P9-J76-1728057181’