Reference based motion correction fails at cross validation for EER data

Dear CryoSPARC team,

Post copied from other thread as requested by @hsnyder :slight_smile:

I’m having a similar issue to Connor in another thread with an EER dataset, except it never gets anywhere at all (starts cross validation then dies before any progress) the job log is full of Unknown field tag 65002 warnings (standard for EER data, it would be nice if you would make a way to suppress that specifically as the logs end up being tens of gigabytes of line after line of that with EER data!)… box size is 440 pixels, 4K sampling (EER upsampling 1), 1344 total EER frames which all read successfully but then the job heartbeats twice, reports complete for both main and monitor processes and fails.

dmesg shows Python has segfaulted in bin_motion.so with an error 6. Memory usage for the process never exceeds 32GB on a server with 1TB, even when I tell it it can use 500GB+.

I’ve previously had 450 pixel boxes work fine with both 4K and 8K TIFF, this is the first time I’m trying EER data.

Only thing other than EER being the issue I can think of is perhaps it’s too few particles per micrograph? I’m actually playing with a contaminant as a test run (~0.5% of the total dataset) so for 13,000 micrographs there are less than 40,000 particles. Still hits 2.4Å, though, which both amuses and depresses me. :sweat_smile:

Data are on a network drive (symlinked to a local directory), network utilisation appears healthy (although not fast - it’s reading one frame at a time?)

The Python segfault is reproducible by one of my collaborators on a completely different dataset on a completely different system (also EER, however).

Hi @rbs_sci,

Thanks for opening a separate topic. Right off the bat I have one question: When you say there are 1344 total EER frames, is that what you have the “EER Number of Fractions” set to in your import job? Or is that just the number of frames present in the actual tiff container file?

Hi @hsynder,

Thanks for the quick response! It’s reporting the total number of raw EER frames in the micrograph movie. I left the number of fractions at the default of 40 during Import.

I have a job log zipped and ready to go if you let me know where to send it (although I’ve had a very brief look and nothing screams out loud what the issue is specifically)…

@rbs_sci actually if you have the dmesg output, could you paste it? Including the seemingly nonsensical hexadecimal output that it (hopefully) prints? Ideally something like this:

[76771.355512] python[39968]: segfault at 854 ip 00007f66deb53b65 sp 00007f66beffb1b0 error 4 in blobio_native.so[7f66deb4a000+2f000]
[76771.355533] Code: 00 00 0f 29 9c 24 90 00 00 00 0f 29 a4 24 a0 00 00 00 0f 29 ac 24 b0 00 00 00 0f 29 b4 24 c0 00 00 00 0f 29 bc 24 d0 00 00 00 <48> 63 bb 54 08 00 00 4c 8d 63 54 48 8d 84 24 10 01 00 00 c7 04 24”

@hsnyder righto, got it.

[520037.289402] python[240741]: segfault at 7f713e675000 ip 00007f9050bee488 sp 00007f90115fbea0 error 6 in bin_motion.so[7f9050bd7000+45000]
[520037.289416] Code: 1f 00 44 8b 4e 0c 48 63 16 48 63 46 04 8b 7e 08 45 85 c9 7e 25 85 ff 7e 19 49 0f af c3 48 01 d0 48 01 c8 0f 1f 80 00 00 00 00 <c6> 00 01 48 ff c0 eb f8 eb fe 66 0f 1f 44 00 00 48 83 c6 10 49 39
[520446.594638] python[241370]: segfault at 7fc20a675000 ip 00007fe11bf13488 sp 00007fe0c2b98ea0 error 6 in bin_motion.so[7fe11befc000+45000]
[520446.594654] Code: 1f 00 44 8b 4e 0c 48 63 16 48 63 46 04 8b 7e 08 45 85 c9 7e 25 85 ff 7e 19 49 0f af c3 48 01 d0 48 01 c8 0f 1f 80 00 00 00 00 <c6> 00 01 48 ff c0 eb f8 eb fe 66 0f 1f 44 00 00 48 83 c6 10 49 39
[520834.265872] python[241777]: segfault at 7efe7e8e9000 ip 00007f1d8c183488 sp 00007f1d51bfeea0 error 6 in bin_motion.so[7f1d8c16c000+45000]
[520834.265887] Code: 1f 00 44 8b 4e 0c 48 63 16 48 63 46 04 8b 7e 08 45 85 c9 7e 25 85 ff 7e 19 49 0f af c3 48 01 d0 48 01 c8 0f 1f 80 00 00 00 00 <c6> 00 01 48 ff c0 eb f8 eb fe 66 0f 1f 44 00 00 48 83 c6 10 49 39
[521781.395726] python[242789]: segfault at 7f05428e9000 ip 00007f2468150488 sp 00007f240fffcea0 error 6 in bin_motion.so[7f2468139000+45000]
[521781.395750] Code: 1f 00 44 8b 4e 0c 48 63 16 48 63 46 04 8b 7e 08 45 85 c9 7e 25 85 ff 7e 19 49 0f af c3 48 01 d0 48 01 c8 0f 1f 80 00 00 00 00 <c6> 00 01 48 ff c0 eb f8 eb fe 66 0f 1f 44 00 00 48 83 c6 10 49 39
[522368.814295] python[243449]: segfault at 7edf46675000 ip 00007efe7ffb9488 sp 00007efe1adf9ea0 error 6 in bin_motion.so[7efe7ffa2000+45000]
[522368.814313] Code: 1f 00 44 8b 4e 0c 48 63 16 48 63 46 04 8b 7e 08 45 85 c9 7e 25 85 ff 7e 19 49 0f af c3 48 01 d0 48 01 c8 0f 1f 80 00 00 00 00 <c6> 00 01 48 ff c0 eb f8 eb fe 66 0f 1f 44 00 00 48 83 c6 10 49 39
[553795.114883] python[251288]: segfault at 7fb3c2fad000 ip 00007fd2dc096488 sp 00007fd29e14eea0 error 6 in bin_motion.so[7fd2dc07f000+45000]
[553795.114896] Code: 1f 00 44 8b 4e 0c 48 63 16 48 63 46 04 8b 7e 08 45 85 c9 7e 25 85 ff 7e 19 49 0f af c3 48 01 d0 48 01 c8 0f 1f 80 00 00 00 00 <c6> 00 01 48 ff c0 eb f8 eb fe 66 0f 1f 44 00 00 48 83 c6 10 49 39

Hope this sheds some light on the issue, I have to be AFK for the rest of the day so anything further will be a bit slow. Thanks for your support.

@rbs_sci, nice, thanks. Are you using a camera defect file by chance?

Yes, I am. Is that the cause?

I think there may be a bug related to defect files, yes. It’s a bit of a pain, but you can work around this for now by not using the defect file and instead putting zeros in the gain reference at the defect pixels, which has the same effect. If you don’t want to re-run patch motion correction, you can use cryosparc tools to set the “mscope_params/defect_path” field to an empty string on the movies dataset

1 Like

OK, I’ll try that. I’ll send other info as I get a chance.

edit: Info sent. Cheers.

Hi @hsnyder

I ran two quick tests on a different EER dataset without the defect file (gain reference modified appropriately).

Reference based motion refinement works OK (so far) both at 4K (currently on iteration 9) and 8K (currently on iteration 7) sampling.

Trying on larger dataset.

1 Like

Quick note for the forum…

While editing the defect file entry in the project via cryosparc tools works, just replacing the defect file with an empty file also works*. :slight_smile:

Thanks to @hsnyder for his support with cryosparc tools!

*Adjust gain reference to include defect map as appropriate.

1 Like

I’ll post this in here as it’s relevant, although it might be better as a separate topic “Tip”? At least until CryoSPARC suppresses the TIFFRead warning for EER…

For CryoSPARCers using EER data, if you don’t want to have your RBMC job.log spamming millions of lines of TIFFRead warnings (and bloating up appropriately) and don’t mind losing job.log, a quick symlink works wonders and doesn’t make CryoSPARC freak out (just removing it did):

  • navigate to job directory, e.g.: cd data/CS-project/J155
    mv job.log job.log.backup && ln -s /dev/null job.log

This only affects one of the background logs (accessed via Job, Metadata, Log) which so far hasn’t contained any useful diagnostic information when EER RBMC jobs have crashed which isn’t indicated elsewhere (i.e.: main log or dmesg)

Many may not care or want the log (which hang the CS UI if opened when it gets above a certain size), but depending on filesystem or storage media in use, it might reduce load, wear and/or I/O queue. Because it’s writing a TIFFRead warning for every single EER frame read… which adds up pretty quickly.

The reference motion crash related to defect files is fixed in CryoSPARC v4.4.1, released today.

1 Like