the job.log of my RBMC files say each EER read is about 107seconds averaged over 282 files. However, when I dd another EER from the same location (and several other random EERs) , reading those files takes only 3 seconds on average
using dd
time dd if=FoilHole_28625605_Data_27658066_34_20240827_212833_EER.eer of=/dev/zero bs=8k
47903+1 records in
47903+1 records out
392427106 bytes (392 MB) copied, 3.08231 s, 127 MB/s
real 0m3.166s
user 0m0.015s
sys 0m0.183s
What could account for the discrepancy? I will say that the location of the EER’s in the dd command are on a different file system than the cryosparc project.
Your topic is timely: CryoSPARC v4.6, which was just released today, includes improvements to TIFF and EER read times for RBMC (along with other significant IO-related speed improvements).
@francis.reyes Our tests went as far as swapping the HDD-based RAID array source to an SSD-based one, but still mounted over a network (10Gb interconnect). That negated entirely the EER overheads. Hopefully the point is now moot with v4.6.
FWIW, we’ve opted to save LZW-TIFF stacks directly at the facilities since EPU first started supporting it.
Our samples are generally hampered by other issues and do not benefit from the fine temporal slicing or upsampling offered by EER. TIFF stacks are a significant space saving as well.
@francis.reyes, wow, very odd - thanks for posting that. Could you do a quick spot check of how the read bg times are distributed (e.g. just grep for it and eyeball the numbers)? I just did some tests and wasn’t able to produce anything near what you’re seeing (I’m reading from NFS), but I’m curious if your results are very consistently slow versus there are a few reads which stalled for a very long time.
read bg, as you probably inferred, is the portion where the background estimate file is read. Those files are very small mrcs, reading them shouldn’t take much time at all…