Slow reading of headers when importing large datasets

request_recorded

(Olibclarke) #1

Hi,

Importing large datasets of tif movies into cryosparc takes a very long time - ~30min to import 6000 movies. Most of this time seems to be taken up by reading the headers of each stack. Is this expected? Would it be possible to parallelize it to make it faster?

Cheers
Oli


(Jesse Yoder) #2

Hi Oli,

I had the same thought but after using an incorrectly-flipped gain reference during import I realized that the gain-correction is done during import. I wonder if this process (and not reading headers) is what takes so long.

Jesse


(Olibclarke) #3

I think it just does the gain correction for the first few movies (which are displayed in the log) - not the rest - because import is not generating gain corrected movies for the dataset (which would take up a ginormous amount of disk space)

Oli


(Ali Punjani) #4

Indeed Import Movies does not gain-correct movies, just loads the gain reference to be able to display the first 3 movies as a sanity check.

I think the import time is due to reading headers… with TIFF i’m not sure why header reading should take so long (since the header is not compressed, unlike .mrc.bz2) but we can add an option to not check headers of all the movies.