Slow reading of headers when importing large datasets

olibclarke · July 9, 2019, 3:31pm

Hi,

Importing large datasets of tif movies into cryosparc takes a very long time - ~30min to import 6000 movies. Most of this time seems to be taken up by reading the headers of each stack. Is this expected? Would it be possible to parallelize it to make it faster?

Cheers
Oli

jesseyoder · July 15, 2019, 9:53pm

Hi Oli,

I had the same thought but after using an incorrectly-flipped gain reference during import I realized that the gain-correction is done during import. I wonder if this process (and not reading headers) is what takes so long.

Jesse

olibclarke · July 15, 2019, 10:12pm

I think it just does the gain correction for the first few movies (which are displayed in the log) - not the rest - because import is not generating gain corrected movies for the dataset (which would take up a ginormous amount of disk space)

Oli

apunjani · August 14, 2019, 4:06am

Indeed Import Movies does not gain-correct movies, just loads the gain reference to be able to display the first 3 movies as a sanity check.

I think the import time is due to reading headers… with TIFF i’m not sure why header reading should take so long (since the header is not compressed, unlike .mrc.bz2) but we can add an option to not check headers of all the movies.

spunjani · February 3, 2020, 4:39pm

This is done as of v2.12.