Non-uniform refinement- confused about input and output!

Noha_Elshamy · January 31, 2024, 2:14pm

Hi, please forgive me if my question reflects a poor understanding of particle analysis- this is my first ever cryoEM dataset and analysis.

I used a subset of particles (25 k), extracted using a box size of 512, binned to 256 for ab-initio reconstruction, followed by one round of homogenous refinement, then used the output to run another homogenous refinement job applying the built in CTF refinement. I got a map at 3.3 angstroms.

I wanted to do the reconstruction with unbinned particles at 512 pixel size, so I re-extracted the 25 k particles without binning, and immediately used this as an input for a non-uniform refinement job (I wanted to see if I will get an improvement in my map if I used this type of refinement instead). I used the extracted particles as an input, and for the input map, I used the 3.3 angstroms one I got from the refinement of the binned particles.

At the same time I could see I was lacking any good top views in my 25 k particles, so I re-extracted from the same micrographs (after applying some parameters to discard more of the bad ones and improve particle picks) a set of about 24 k particles that contained 500 of intact top views. I extracted them at the same box size of 512, and I want to add that in this case and in the unbinned re-extraction I mentioned above, I used an input from 2D classification so I believe the particles have 2D alignment information. Now for some reason, I added this 24 k particle set also to the input of the non-uniform alignment, not realizing at the time that there could be duplicate particles. The Job took a very long time, around 4 hours, running on two GPUs, and yielded a map of 2.8 angstroms.

When I had two problems : 1) there must be duplicates between my two particle subsets, 2) the input map had 3D alignment information from only half of the particles, I was extremely confused about how the map yielded that high resolution when I imagine it shouldn’t have worked. When I combined the particles stacks and used as an input for 2D classification, applying the “remove duplicates” parameter, I got only 1260 duplicates, so its a very large subset. However when I used this combined stack to repeat the non-uniform refinement (still using the same 3.3 angstrom map from the initial reconstruction) it yielded a very poor result, of about 8 angstroms. I believe this happened because combining the particles in new 2D classes messed up their alignment with the map I already had?..To avoid the mess, I tried running non-uniform refinement for each particle stack separately (but still with the same initial map input) and it seems to be doing better, but the run time is extremely long…given the stacks are just 25 k each…

To summarize this very long message I want to ask:
1- How could I get such a high resolution map when half the particles contributed non of the 3D alignment info in the input map?

2- How can I best combine these two particle stacks for a refinement job?

3- Why is non-uniform refinement taking a huge amount of time ~ 7 hours for such a low number of particles?

Thanks

rposert · January 31, 2024, 4:14pm

Hi @Noha_Elshamy, and welcome to single particle analysis! I’d love to try to help you out here.

How did you get a high-resolution map when half of the particles didn’t have 3D alignments?

This is a great question! The answer is that Non-Uniform Refinement does not use any of the particles’ input alignment information, so it doesn’t matter that half of your particles were missing alignments in the first place.

This is because Non-Uniform Refinement is a global alignment — it checks all* orientations for each particle, every time. This process is very fast in CryoSPARC because we use a branch and bound algorithm to quickly arrive at the best pose. If you’re curious how this algorithm works, it’s described in the guide page on ab initio reconstruction, and described in greater detail in the original CryoSPARC publication. Don’t feel like you need to understand the algorithmic details to use the method though — I only link these resources in case you’re curious!

There is a refinement that uses the input particle information, called Local Refinement. Local Refinement is generally most useful for improving the map quality for a particular sub-region of your volume, especially if that region is rigidly flexible relative to the rest of the volume. The tutorial and job details for Local Refinement were both recently updated to be more helpful, so when you get to that stage be sure to check out those guide pages! The case study especially goes into detail about why we might prefer to have some alignments be global and some local, and when to use which type.

* when I say it checks all orientations, I really mean it considers all orientations — the Branch and Bound algorithm discards a large number of possible poses without ever checking them if it is extremely unlikely they would be the best pose. There’s more detail in the guide page if you’re curious!

How should you combine particle stacks?

Generally, your procedure sounds right to me — you can just plug all of your particle sets into the Remove Duplicates job and the output will have only one copy of each particle. If you’ve run a 2D or 3D refinement on the particles, you can change the job to use the particle with the better alignment. There are more details in the job page.

Why are refinements slow?

This is a trickier question that depends on a lot of things. Do you mind if I ask you a few questions?

Can you explain a bit more what you meant when you said that the Non-Uniform Refinement took 4 hours running on 2 GPUs? Did you have one job using 2 GPUs? Or were you counting the time for two jobs, each using 1 GPU?
What made you want to use your full-size (512px) images rather than the 256px downsampled images?
How many particles were in the Non-Uniform Refinement job that is taking about 7 hours with a box size of 512px?

You also may want to take a look at the recommended hardware and ensure that your system meets those requirements.

Why did your alignment get worse

If you like, I can try to help you understand why your alignment went from 3.3 Å to 8 Å when you added the new particles from your 2D classification. If you could post images of both the 3.3 Å and 8 Å maps and FSC curves as well as the 2D classes of the particles that you added, I might see something to help guide you in the right direction.

Thanks for your great questions!

Noha_Elshamy · February 1, 2024, 7:03am

Thank you very much for your prompt reply! It is very insightful to me.

To answer your questions:

1- Yes, I parallelized 2 GPUs to run one non-uniform refinement job, using 50 k particles. These were at 512px.

2- I wanted to try the reconstruction with the full size particles, because after doing local refinement of a part of my particle, I hit my resolution limit, of 3.152 angstroms (not actually but it was 3.21), so I thought that if I use pixel size I used for my data acquisition (0.788) instead of the downsampled images, I can extend the resolution limit and see if it would get better. I intended to just use homogenous refinement and see if the resolution would improve, but I had just seen your tutorial on non-uniform refinement and I thought it might yield better results…so I was practically just experimenting with the data…

3- I used 25 k particles for the non-uniform refinement job that took almost 7 hours and because I thought it should be less computationally intensive, I used only 1 GPU.

I attached pics to show the maps and the GSFSC curves. You would notice that the 50k particle map with the two combined sets looks a bit different. This is because I have a problem with how my particles are (for some reason I still don’t know) are very close together (stuck top to top or bottom to bottom, in a filamentous arrangement). I tried to solve this problem by changing the circular mask diameter and recentering in 2D classification but it doesn’t help. If anything, it makes it worse by shifting to where the particles are stuck, probably because the density is higher there.

Noha_Elshamy · February 1, 2024, 7:05am

rposert · February 1, 2024, 6:37pm

Thanks for those details @Noha_Elshamy — I have a few more questions, and then a few suggestions!

First, the questions:

Could you show me the setting or settings you used to parallelize a single non-uniform refinement across multiple GPUs?
Which version of CryoSPARC are you using?
Are you using low-memory mode for any of these refinements?
Could you give me a little more detail about how these particle stacks were generated? Especially your last one, which you’ve labeled “2 sets of particles (2D classified-ab initio)” with 37k unbinned particles. That particle stack has the best FSC curve (where it smoothly descends to 0), so I think that should be the one we focus on for now.
Does your particle have any defined symmetry? And are you applying symmetry at any point?
Could you show me some of the 2D or 3D classes you discarded? I wonder if you may have more good particles that could improve your (already good!) final resolution.

Now, a few suggestions:

Have you tried Orientation Diagnostics on your fifth result (which goes to 2.56 Å)? These maps look very beautiful to me, and the top views may not be as important as you think. If Orientation Diagnostics indicates that your map is well-sampled, I wouldn’t worry too much about the top views. The thing we ultimately care about is the map quality!
You could try downsampling to a slightly smaller box size, maybe 448 pixels. This would likely speed up processing without harming your resolution. Make sure you use the Fourier crop option in that job!
You could try creating a mask around the central copy of your protein and then performing a Local Refinement of just that one copy. This would remove the influence of the adjacent copies and might improve your resolution as a result. We recently updated the guide pages for both of these jobs. If your particle has defined symmetry, you could apply symmetry here too.

Noha_Elshamy · February 5, 2024, 5:51am

Hello again and sorry for my late reply:

1- I apologize for this but when I checked the first NU-refinement job I did I found that only 1 GPU was used. So for all of the NU-refinement jobs I executed, whether with all the particles of a subset, only 1 GPU was used.
2- I am using CryoSparc V4.
3- I am not using the low-memory mode for any of the jobs, and the “cache particle images on SSD” is always “on”.
4- This particle stack was created by first putting all 50K particles in a 2D classification job, “remove duplicate particles” on, resulting in ~49k particles. I input these in a “select 2D” job, discarded some low quality top view classes, and input the remaining particles “37k” into an “ab-initio reconstruction” job (i class), because I thought getting a map from these particles to then align them might yield better results…instead of using the initial map I constructed from the first set of 25k particles that were binned. I then used the map from this ab initio job for alignment in the NU-refinement job, and used the 37k particles.
5- My particle is symmetric, but I am not sure what symmetry it has. That is why I wanted to get good top views, which I finally did when I re-picked from the micrographs. In ab initio, I don’t apply any symmetry, but in refinement, I always apply a symmetry.
6- I attached a photo of some of my top views, which I see as bad or at least non informative…to me at best. I discarded these from the 49k stack to become 37k.

Thank you so much for the suggestions! All sound very helpful to me. I appreciate your time.