Minimum group size for beam shift clustering?


The new beam shift clustering feature in Exposure Group Utilities is excellent!

However, with leginon generated adhoc beam tilt values as the input, there can be some small residual clusters of a few images (at least using the agglomerative clustering method).

These groups can sometimes be too small to allow for reliable beam tilt estimation.

I wonder if it would be possible to set a minimum group size? So the agglomerative clustering would first split into n clusters, then dispers clusters below the minimum group size amongst their nearest neighbor groups?

Also, is there any data-driven way to optimize the number of clusters? Obviously there is a trade-off between the size of the groups and the accuracy & precision of the beam tilt estimates - would it be possible to optimize the grouping on the fly during Global CTF refinement? I imagine for high resolution structures this could make a difference


1 Like

Hi @olibclarke,

Thanks for the feature requests! A minimum group size makes sense, and we have recorded internally to consider how best this would be done. It could be within the exposure group clustering step, as you noted, or potentially actually within Global CTF Refinement itself.

For now, have you had better luck with using k-means clustering instead? For data that shows more of a continuous distribution over beam shift values, it often produces more even cluster sizes than the agglomerative (hierarchical) method.



Thanks Michael! Haven’t tried k-means yet for this data but will give it a go, thanks!