Multiple jobs on our instance throw file I/O errors. It seems we are hitting system limits for number of open files. One NU-refinment job currently has more than 70,000 open files! Is that by design? The job is processing a large data set with 1.4M particles from 22k micrographs.
The instance is running v. 4.7.0. In this thread, it is said that version v.4.6 increased the number of open files and that v.4.6.1 reduced that a bit.
@daniel.s.d.larsson,
What you are seeing may be intended behavior. Once particle stacks are copied to the SSD, there is a substantial performance benefit in keeping the file descriptors to them open. For that reason, yes, we do keep file descriptors open - as many as possible. During extraction, one particle stack (individual MRC file) is created per micrograph. It is unusual that you would have 70,000 open file descriptors from only 22,000 micrographs. Did you, by chance, combine multiple particle sets from different extraction jobs from the same micrographs? If not, then I would want to investigate further.
Assuming that’s the issue and nothing’s behaving incorrectly, you could try using a restack particles job to combine your particles into fewer independent MRC files. That would reduce the number of file descriptors that need to be kept open by downstream jobs in order to achieve good IO performance.
As I explained in a previous thread, Queue message [Errno 24] Too many open files: - #11 by hsnyder there are several things on Linux that limit the number of open file descriptors. There are system-wide limits and per process limits. If you haven’t already, I recommend reading the linked post as it may allow you to prevent future headaches by increasing your system limits, unless that’s impractical for organizational / non-technical reasons.
By default, we limit our particle IO system to using at most N-768 file descriptors, where N is the OS-reported per-process maximum open FD limit. If you cannot change the open file descriptor limits, you can control CryoSPARC’s behaviour with the environment variable CRYOSPARC_IO_FD_LIMIT
in your worker installation config.sh file. If set to a negative integer, it replaces the -768 in the above formula. If set to a positive integer it directly overrides the default limit.
In summary:
- this is quite possibly intended behaviour
- you might try restacking particles,
- if that doesn’t work, and unless impractical for nontechnical reasons, changing the OS fd limits is the next best solution
- if you can’t do that, you can tell cryosparc to cache fewer open file descriptors
–Harris
1 Like