Reference based motion correction: different hyperparameters from the same dataset

Guillaume · April 15, 2024, 8:36am

Hello,

I have a dataset with clear discrete heterogeneity giving three populations of particles. They are all the same complex, some complete, some with missing subunits, and were all easily sorted by heterogeneous refinement.

They all reached ~2 Å using particle images at the original pixel size (0.65 Å/pix), and given that there is still some margin until hitting the Nyquist limit, I tried to use RBMC. I made sure to run “Remove duplicates” on each particle set with default options, to avoid the error caused by duplicates in RBMC. Then I connected my three sets of particles to a single RBMC job but still got the “matrix is singular” error caused by duplicates, although not right away but at a much late stage (I think at the motion correction stage? not sure, and now I deleted this job so I can’t check).

My guess is that these duplicates are particles contributing to more than one reconstruction (with different weights), and I could probably fix it either by running the “Remove duplicates” jobs with stricter settings or by re-running heterogeneous refinement with “hard classification” on. I did not try this. Instead, I set up independent RBMC jobs for all three sets of particles (it was before the weekend, so I didn’t mind for that to take a long time).

Now I am a bit puzzled because two of the three sets of particles gave the same hyperparameters, but the third one differs:

Using hyperparameters:
	Spatial prior strength:       4.6001e-03
	Spatial correlation distance: 500
	Acceleration prior strength:  6.4435e-02

Using hyperparameters:
	Spatial prior strength:       5.4788e-03
	Spatial correlation distance: 500
	Acceleration prior strength:  1.4751e-02

Using hyperparameters:
	Spatial prior strength:       4.6001e-03
	Spatial correlation distance: 500
	Acceleration prior strength:  6.4435e-02

With particles from the same set of micrographs, I would expect the search to converge to the same hyperparameters even with different subsets of particles and micrographs.

I will let the jobs complete and run homogeneous refinements on the outputs to see what happens. If anything looks suspicious, my first instinct would be to re-run RBMC on the middle set with hyperparameters from the two other sets.

So, my questions are:

Is it suspicious to get different hyperparameters for one of the three sets of particles in this dataset?
If so, what is the most reasonable thing to do about it?

Thank you!

olibclarke · April 15, 2024, 1:04pm

Hi @Guillaume,

My first guess would be that these parameters are going to depend to some degree on the sparsity of your particles in the micrographs - are these particle sets more or less the same size? I would also say that these parameters are not that different - not off by orders of magnitude. Possible that this is just a stochastic difference. If anything it is surprising to me that the first and third sets give exactly the same parameters…

Cheers
Oli

Guillaume · April 15, 2024, 1:28pm

Yes, I initially found this surprising too. But I think landing on the exact same optimum could make sense, depending on the sampling. In this sense, it would mean that the search is robust because it finds the same hyperparameters with two different subsets of the micrographs and particles. Under this reasoning, it is the different set that I find more surprising.

The three sets have this many particles (in the same order as the hyperparameters I listed above):

142 619
299 578
270 946

That doesn’t strike me as very imbalanced. Based on the trajectory plots, the first set seems much more sparse than the two others (consistent with the lower total number of particles).

Maybe this observation is nothing to worry about. The first two RBMCs are still running, but I will take a closer look when I have the refinement results after that.

Guillaume · April 16, 2024, 8:41am

Well all final maps look perfectly reasonable. For due diligence I am now running the last heterogeneous refinement with force hard classification, to see if the class distribution and reconstructions change a lot, and if they do I will repeat the downstream steps with these new classes.