RBMC: unexpected EER overhead?

Hi,

We’ve been testing RBMC on a mixture of Falcon4i EER and K3 unnormalised TIFF datasets. We’ve noticed that processing EER stacks is accompanied by a significantly higher overhead at the read step.

refmotion worker 0 (NVIDIA GeForce RTX 2080 Ti)
scale (alpha): 15.672696
noise model (sigma2): 52.908432
TIME (s) SECTION
0.000077585 sanity
68.702019488 read movie
0.035934693 get gain, defects
0.154643410 read bg
0.012367303 read rigid
0.943594613 prep_movie
1.229257339 extract from frames
0.004386706 extract from refs
0.000027244 adj
0.000000252 bfactor
0.210973914 rigid motion correct
0.000995545 get noise, scale
71.294278091 — TOTAL —``

Could someone help shed some light on what it might be busy doing in the background? It seems to spend a long time reporting on the unknown 65002 tag, although I understand this to be largely a cosmetic issue. If it’s just a header check, is it possible to skip it like we do at Import? Or perhaps perform the header check only once it’s copied into memory?

Cheers,
Yang

From what I can see based on network utilisation, it’s not a header check per se but it’s reading one frame at a time, hence the overhead.

The 65002 warning needs suppressing, though, as a large dataset at medium magnification can have millions of lines of that warning, resulting in logs being multiple gigabytes.

Yes, this would be good. We see our job.log expanding dramatically.

From our perspective, it appears, superficially, similar to the overheads we experience when we disable skip header checks at import, i.e. hours added to a job. If it’s that, then at least it’s consistent. However, we’re wondering if there’s maybe more to it than that?

Cheers,
Yang

Hi @leetleyang, are your movies mounted on a network filesystem of some sort? A while ago we noticed that reading TIFFs and EERs over a network was extremely slow for some users and we implemented a workaround involving copying the file into shared memory and reading it from there. Unfortunately, that workaround didn’t play nicely with reference based motion correction, so we disabled it in RBMC jobs. If you’re on a network filesystem of some sort, I expect this is the problem that you’re experiencing.

It’s on our radar to fix this, and while I can’t be too specific on a timeline, a future release of CryoSPARC won’t suffer from this performance anomaly. In the meantime, here’s one possible workaround: if you have enough space on your cache SSD or any locally attached storage, you could copy the movies onto a local SSD and read them from there.

1 Like

Hi Harris,

Yes, in our case the EERs reside on network-mounted space. Thanks for the suggestion. We will see if we can implement something along those lines.

Although I’m curious if you think the issue stems from the network aspect or the fact that these frame integrity checks(?) are essentially random read events, which may not jive well with spinning media in general. On our setup, TIFF stacks (50-70 frames) are read in 2-3s, but of course our EER stacks have >an order of magnitude more frames.

Cheers,
Yang

Hi @leetleyang.

Although I’m curious if you think the issue stems from the network aspect or the fact that these frame integrity checks(?) are essentially random read events, which may not jive well with spinning media in general.

If you’re interested, one of the underlying problems is that in the TIFF file format, you can’t tell how many frames are present or where in the file each frame is without reading the entire file. So we go through the file twice - once to count the frames (so we know how much memory we need to allocate), and once to actually read in the decompressed data (in principle the page cache should help with this). On top of that, libtiff seems to access the file according to a suboptimal pattern (I’m not sure exactly how), and the suboptimality shows up most acutely on network filesystems. Without knowing the exact access pattern I can’t say for certain how random it is. But essentially yes I suspect the access pattern isn’t ideal for spinning media in general, and I know for fact it isn’t ideal for network file access.

On our setup, TIFF stacks (50-70 frames) are read in 2-3s, but of course our EER stacks have >an order of magnitude more frames.

Yes, that sounds (unfortunately) about right to me and I think you’ve correctly identified the reason.

1 Like

Hi Harris,

Thanks for expanding further on the issue. Very informative.

Cheers,
Yang