Option to not generate non-dose weighted motion corrected averages?

olibclarke · March 8, 2023, 9:31pm

Hi,

Patch motion generates for each stack two motion corrected averages - one with dose weighting applied, and one without.

This is (if I understand correctly) so that one can perform CTF estimation using the non-dose-weighted average, while extracting from the one with dose-weighting applied.

However, it also takes up twice the storage space, for the job that takes up the single largest amount of space (usually) in our workflow. E.g. for a current 13k mic super-res dataset, the Patch Motion job alone is taking up 10TB (!).

I think in most cases the results are comparable when performing Patch CTF on dose weighted vs non-dose weighted averages. So would it be possible to add an option to not generate the non-dose-weighted averages? Or alternatively/additionally, to allow for their removal after Patch CTF is complete? This would allow us to save quite a lot of space…

Cheers
Oli

rbs_sci · March 9, 2023, 2:27am

It’s like you’re reading my mind!

I’ve been thinking this for a while, and was about to write a post up.

Recently we have several big datasets, which become unwieldy in CryoSPARC if using super resolution (and even when not for one dataset - if I used superres it would have broken 18TB in CryoSPARC just for motion correction…)

edit:

Hm. I get quite different CTF estimation results between non-dose-weighted and dose-weighted micrographs using the CryoSPARC algorithm.

Non dose weighted:

Dose weighted:

Using CTFFIND, variance in result for non-dose-weighted and dose-weighted are negligible.

Non dose weighted:

Dose weighted:

Obviously this is a quick test using a small dataset so others should probably run a quick test on [data of choice] too.

Micrograph 24 varying the most in CTFFIND doesn’t surprise me - it’s a bad micrograph (extreme drift) left in for the purposes of testing/training.

DanielAsarnow · March 9, 2023, 2:56am

Surely those totals include the movies though? Anyway, I agree - non-doseweighted are basically useless since CTF estimation also works fine on the dose-weighted ones.

float16 would be nice too

@rbs_sci I don’t think that level of variance from Patch Motion is significant/important (~30 A), I think it’s more a reflection of the average patch value being more subject to slight variation than the whole-micrograph value from CTFFIND4.

rbs_sci · March 9, 2023, 3:01am

Doesn’t include the movies. Or a recent processing directory would be 60TB before starting.

I’m not a fan of float16, but that’s from limited testing; one dataset doesn’t give me good results with float16 mics, not quite sure why…

Edited my earlier post to include some CTF estimation comparisons between dose/non-dose mics in CryoSPARC using patch estimation and CTFFIND. Interesting. If others would check a small subset of some of their data I’d appreciate whether you see similar.

rbs_sci · March 9, 2023, 3:17am

@DanielAsarnow Probably. However, given that we often discard micrographs based on CTF estimated resolution (“Fit”), using the patch estimation algorithm with dose-weighted micrographs would have resulted in discarding an extra three micrographs.

I’d test on more (and larger) datasets, though.

DanielAsarnow · March 9, 2023, 3:41am

How much variation to you get from changing the resolution range (e.g. to 30-5Å like CTFFIND4), or the search domain?

rbs_sci · March 9, 2023, 4:21am

If I set the patch estimation to the same parameters as CTFFIND I see the same effect. Non-dose weighted are fine, dose weighted are not so good.

Non-dose weighted (patch estimation)(same params as CTFFIND):

Dose weighted (patch estimation)(same params as CTFFIND):

DanielAsarnow · March 9, 2023, 5:06am

If you compare Patch CTF noDW to Patch CTF noDW, then even with just the slightly different resolution range there’s almost as much variance as from using the different micrographs. Except for the close to focus ones where you might just need the lower resolution to fit well (or to decrease minimum defocus to 200 Å), I think this is an example of the numerical roughness that’s intrinsic to our current software/hardware given performance constraints. After all, even summing an array is numerically unstable.

If you want to guard against it, you could run another CTF estimation job using CTFFIND4 or Patch CTC w/ no or few knots and then take the union of the good micrographs.

I bet the problem micrographs just have a few bad patches that can’t be fit well, the other patches may still be fine. I also imagine that if the average of the frame power spectra were used (instead of the power spectrum of the average), then even the close to focus micrographs might be fit well at the default resolution range.

I’d also add that if the defocus estimate is accurate, but the resolution at which the fit correlation drops below 0.4 is changing then the problem might be that resolution metric and not the fit.

rbs_sci · March 9, 2023, 7:18am

Agreed. Do the same thing ten times and get ten slight variations in answer. Variations in motion correction, CTF estimation, in picking, angular assignment, etc, etc. Also limited by how computers deal with non-integer numbers.

I’m not particularly worried, just highlighting that if using the dose weighted micrographs only, when following “normal” procedures it might result in throwing a lot of micrographs away which are otherwise entirely useable.

So, basically, it’s just something to be aware of.

olibclarke · March 10, 2023, 8:28pm

Is the CTF fit reported by Patch CTF derived from the average of the patch based fits? I thought based on the log file it was based on the fit to the 1D average of the PS, is this not correct?

DanielAsarnow · March 11, 2023, 7:16am

Great point, I have no idea. I just assumed the fit quality wouldn’t be based on the 1D fit because in CTFFIND4 it’s based on the final (monolithic) 2D fit. All the more reason not to filter by CTF “resolution” if so…

I usually do 6 Å, and if that costs more than ~7% of the data, then 8 Å. Maybe I’ll stop doing that and just use relative ice thickness.

olibclarke · March 12, 2023, 12:40am

I think you are right about it being the average of the patches @DanielAsarnow. For a mic with a big section of gold in the frame, here is a comparison:

Default settings:

Force 1x1 patches (1x1 X/Y knots):

Quite a difference…

DanielAsarnow · March 12, 2023, 2:56am

The defocus isn’t too far apart at least! If you use a higher low res limit does it work with patches?

Maybe the average should down weight or exclude outlier patches, at least for the CC calculation. In the “bad” fit presumably all the other patches are still fine.

olibclarke · March 12, 2023, 3:11pm

I think in this case maybe the best way to go is to do whole-frame CTF and then per particle defocus refinement - most of the micrographs have a lot of gold in the frame, and it throws off any of the patches that are nearby, whereas the whole-frame estimate is pretty robust. It’s a big-ish particle so should be fine! Would be good to be able to mask out gold/contamination for these kind of cases though…

hsnyder · September 6, 2023, 4:08pm

Hi everyone,

Thanks for the feature requests. Broadly, we agree that there’s an inefficiency here and that combining motion correction and CTF estimation together, and/or float16 output, would reduce that inefficiency. I’ve recorded these requests.

Harris

olibclarke · September 6, 2023, 4:17pm

Thanks @hsnyder! If combining motion correction and CTF estimation together, one option might be to perform patch CTF on the aligned (non dose-weighted) movie, rather than the motion corrected micrograph.

Using CTFFIND this can give better results than CTF estimation on the micrograph alone (e.g. see Fig 1 in https://www.sciencedirect.com/science/article/pii/S030439911500128X), but it is not currently possible in Cryosparc.

As @DanielAsarnow suggested, some kind of outlier removal for the patches (for both motion and CTF) would also be very useful I think!