I would like to compare the severeness of preferred orientation between several datasets for the same protein sample but of different size and quality. I am working with a test sample (a rod-like protein which definitely has a preferred orientation) but I would like to develop the approach applicable for screening of any other sample. Ideally I would like to use one or two quantitative metrics to find a correlation with sample preparation conditions. So I was wondering if the scores from Orientation Diagnostics Job, in particular cFAR and SCF, are dependent on the quality and amount of data collected? And if yes, if there is anything else I could use? For example, if I could use the results of Rebalance Orientation job by dividing number of excluded particles to total number?
Conceptually, both cFAR and SCF are not dependent on the amount of particles collected as they are designed to measure anisotropy at different resolution values (i.e., they should report high values for an isotropic 8A structure refined with 5K particles just as well as an isotropic 3A structure with 100K particles). cFAR in particular is actually completely unaware of the amount of particles used to refine the volume (since it only looks at half maps), while SCF* is fairly stable above a certain particle count (by default Orientation Diagnostics computes SCF* with a random fixed-size set of at most 10K particles).
In practice, however, there can be some dependence on particle amount if more data means better poses, better CTF values, or (conversely) more junk. The same goes for data “quality” – it depends on whether or not improving quality means that you can improve poses, CTF estimates, and junk removal. If better quality data results in higher SNR particles being refined into the same exact poses, then these values will be largely unchanged.
What we have found to work reasonably well internally is to compare cFAR scores from refinements with equal amounts of particles. Note that, in our hands, cFAR can sometimes report lower-than-expected scores, but never (in our experience) reports a high number (>0.5) when there is visual evidence of anisotropy present. In the former case, you might also be able to make use of the new “tFAR” score that we report in v5.