Reference based motion correction, where are the movies being read from?

the job.log of my RBMC files say each EER read is about 107seconds averaged over 282 files. However, when I dd another EER from the same location (and several other random EERs) , reading those files takes only 3 seconds on average

cryosparc RBMC:
grep ‘read movie’ job.log | awk '{ total += $1; count++} END { print total/count; print count} ’
107.685
282

using dd
time dd if=FoilHole_28625605_Data_27658066_34_20240827_212833_EER.eer of=/dev/zero bs=8k
47903+1 records in
47903+1 records out
392427106 bytes (392 MB) copied, 3.08231 s, 127 MB/s

real 0m3.166s
user 0m0.015s
sys 0m0.183s

What could account for the discrepancy? I will say that the location of the EER’s in the dd command are on a different file system than the cryosparc project.

Hi,

Perhaps this may be relevant:

Cheers,
Yang

Thanks @leetleyang. Has the route of copying to local SSD and then reading into memory been explored?

@francis.reyes ,

Your topic is timely: CryoSPARC v4.6, which was just released today, includes improvements to TIFF and EER read times for RBMC (along with other significant IO-related speed improvements).

–Harris

1 Like

Thanks, @hsnyder! Looking forward to trying it. :pray:

@francis.reyes Our tests went as far as swapping the HDD-based RAID array source to an SSD-based one, but still mounted over a network (10Gb interconnect). That negated entirely the EER overheads. Hopefully the point is now moot with v4.6.

Cheers,
Yang

No change with 4.6, 180 seconds (averaged over 800 read movie steps) per EER. Copying to SSD first might be a better alternative.

In fact 4.6 is slower “read bg” is now taking 118 seconds vs less than a second in previous versions.

4.4.2:

grep ‘read bg’ job.log | awk '{ total += $1; count++} END { print total/count; print count} ’
118.303
812

4.6:
grep ‘read bg’ job.log | awk '{ total += $1; count++} END { print total/count; print count} ’
0.198038
1637

Ouch.

Hi,

FWIW, we’ve opted to save LZW-TIFF stacks directly at the facilities since EPU first started supporting it.

Our samples are generally hampered by other issues and do not benefit from the fine temporal slicing or upsampling offered by EER. TIFF stacks are a significant space saving as well.

Cheers,
Yang

1 Like

@francis.reyes, wow, very odd - thanks for posting that. Could you do a quick spot check of how the read bg times are distributed (e.g. just grep for it and eyeball the numbers)? I just did some tests and wasn’t able to produce anything near what you’re seeing (I’m reading from NFS), but I’m curious if your results are very consistently slow versus there are a few reads which stalled for a very long time.

read bg, as you probably inferred, is the portion where the background estimate file is read. Those files are very small mrcs, reading them shouldn’t take much time at all…