Simultaneous SSD caching by multiple jobs result in OSError: No space left on device

Hi,

I recently started few 2D-classification jobs simultaneously, with “Cache particles on SSD” enabled. I noticed that the first job went through, while other 3 were unsuccessful – in the middle of caching, they all failed with “OSError: no space left on device”, despite the fact that there’s ~1.5 Tb SSD cache on the requested node, and ~200 Gb of particles for each job.

However, when I ran them one by one, so that no simultaneous caching would occur, they all worked fine and then successfully did the 2D classification at the same time.

So, my assumption is that, when deleting unnecessary particles from cache and copying new ones, different jobs are unaware of each other, which results in not enough cache being freed.

@marinegor Patch 230302 for CryoSPARC v4.2.0 includes an update of particle caching.

2 Likes

amazing, thanks for the backtracking!

1 Like