We’ve tried to reproduce the issue on our 2nd Cryosparc instance, but we were unable to do so.
On the 2nd instsance, the caches are per-node local SSDs.
The instance where the issue is triggered, it is a cluster lane with a shared filesystem (Beegfs).
We observe this as a regression on the shared filesystem, since we’ve updated to v4.4.1. We assume it’s related to the cache locking implementation on BeeGFS.
Do you recognise this as a bug, and otherwise what is your recommended setup in this situation?
Should we disable the cache? Or, is there a way to configure a cache path that is scoped to each compute job in the cluster (i.e. Dynamic SSD Cache Paths), but on an ephemeral filsystem?
i.e. /beegfs/per-job-dir
that will be wiped after the job run. - My impression was that the cluster lane effectively requires a shared filesystem as cache.