3DVA total variance

DanielAsarnow · July 11, 2020, 6:10pm

Is there a method for computing the total variance, so that the amount of variance explained by each component can be calculated? The latter is simple (just the variance of the component values across the particles).

I think it’s definitely something reviewers will be asking when a 3DVA axis is presented in a manuscript.

apunjani · July 14, 2020, 8:50pm

@DanielAsarnow we’ve been thinking about this a little and it might not be completely straightforward to compute the total variance in 3D structure space, but maybe more plausible to compute the fraction of variance explained in 2D image space… we’re still figuring it out.

Any more details about what you think it might be nice to be able to report? Maybe it would be good to be able to say how large is the variance along a given variability component vs. the noise level in the images, to give a sense for the significance of the variability?

DanielAsarnow · July 25, 2020, 11:54pm

Something like the error from the whole linear subspace image model, and the amount
gained from each component, might make sense too.

In a full PCA/SVD analysis, say of data with N observations and D dimensions, one might expect that the component magnitudes fall off logarithmically, after the first k << N components that explain most of the variance. The noise and anything that can’t be represented as a linear combination of components are split across many small, random looking components. Then it’s pretty easy to see how much each is PC worth and how good the k-rank approximation is.

For 3DVA, we can’t really compute hundreds of PCs to see the full curve, but we can at least be confident that structural changes within a component were most parsimoniously described together. I would really like to be able to say, for example, whether a PC I connect to some biological process is genuinely more significant (justifying ignoring most of the other PCs) vs. there being very many dynamical modes of similar magnitude, from which we are plucking the ones that have interesting spatial correlations across the structure.