Dear cryoSPARC Developers,
I wonder if you could help us with the poor cryoSPARC performance when accessing files from petabyte disk arrays?
We have new servers with NVIDIA RTX6000 cards. cryoSPARC runs fast with the data kept on local drives and ~20 times slower if it needs to retrieve the flies from GPFS or Lustre file systems.
The affected steps are import movies, motion correction, and CTF estimation. Interestingly, if we start the same import job for the second time, i.e., when the data sits in GPFS/Luster cache, it will run 50-100 times faster than the first time. Our disk arrays engineers looked into the issue and assured me that it is possible to make a relatively trivial code change in cryoSPARC to fix this.
Given the typical size of the current datasets and the proliferation of petabyte disc arrays in the cryoEM related research, this issue could be critical for many cryoSPARC users. I would gladly connect you with our IT department and work closely to resolve this issue for everyone’s benefit.