Generalized protocol for 3D classification of particles

mjmcleod64 · October 19, 2023, 4:31pm

Hi all,

I am looking for a protocol to clean particle stacks in 3D - using mutli-class ab-initio and heterogenous refinement. Briefly:
I have a heterogeneous sample (monomer (127 kda), dimer, tetramer, filament (2-5 mDa)) ranging from 100 A - 500 A (or more), with some globular shapes, some oval shapes, square shapes (ringish) and a pseudofilament. I was first trying to discriminate the different complexes by 2D classification, but it was recommended to use 3D classification. I thought this would be the best because the box size for the pseudofilament is vastly different then what is needed for the monomer.

My thought is to select a picking protocol that gets everything, indiscriminately. Then, do many rounds of 2D classification with the largest box size required for the largest object (psuedofilament). Use 2D classification to get rid of bad picks, and anything that looks like a protein is kept. I would likely use 150 classes with default settings.

My question now is what sort of procedure/settings do I use for 3D classification? My thought is ab initio with 7 classes (3+ more than # of complexes), with class similarity = 0. Then use all particles and all volumes in heterogenous. Next, remove all bad refinements that look like noise? What metric/validation should be used to discard class/particles? I am not sure how to discriminate between poorly populated class and junk. Do you let multi-class ab initio give you a junk volume? I see commonly that it isn’t necessarily junk class, but rather splits a volume into two views. Then, redo this again…over and over until there are effectively no particles in the junk volume classes.

Is there something else I should be considering? Will the small particles (and their views) be selected with a large extraction box?

Matt

rwaldo · October 19, 2023, 5:29pm

Hi @mjmcleod64! Sounds like a tricky sample, hopefully we can help!

Could you post a few example micrographs, just so I can see what you’re working with? I’m especially interested in how many filaments/pseudofilaments there are, and how long they typically are, as well as the general quality of the ice/SNR.

Now, as for the general workflow, we’ve got two major recommendations

Blob picking and cleaning

We’d recommend starting with particle picking as you say — pick basically every oligomerization state and put them all in one big stack.

Next, use 2D classification to only remove very obvious junk. With so many different sizes of particle, you’ll want to request a lot of 2D classes, and be very very generous about what you keep, since it will be hard to tell what’s good and what’s bad.

Then, move into the third dimension. Plug all the particles you kept into ab initio job to keep cleaning them up. As far as the number of classes, we’d recommend that you set up a bunch of jobs and keep increasing the number of classes until either 1) you run out of memory or 2) you start to get classes which look similar (e.g., you see two classes of a dimer or some other oligomer).

Once you’ve got a volume for each of your classes, you can either proceed with heterogeneous refinement or ab initio (I’d probably recommend trying both), cleaning the particle stack at each step by discarding junk classes. You don’t want to separate your particles out into separate stacks for each oligomer until you’ve got them as clean as you possibly can.

Trained picking

You may also want to consider training a model to pick specific oligomerization states of your protein independently, such as Topaz or crYOLO. You could manually pick a few micrographs if it’s easy to distinguish the oligomerization states, or use your clean 3D classes from blob picking if it’s harder to tell by eye.

mjmcleod64 · October 19, 2023, 6:13pm

Thanks, this is helpful. Hopefully this isnt TMI!

My general workflow thus far was to optimize box size for each complex, 2D clean (I more or less know what I am looking for based on crystal structure). Then, when I have a set of stacks that represnet a complex, I create templates and then go forward with template picker and curating that specifically for a complex. The reason for this is if using a large box size, I get some good look 2d classes of some things, but rarer views are obscured.

Large box (1260px px =0.969A)

green = filament
blue = tetramer
red=dimer
orange = monomer

Now with a smaller box (620 px)

It is easier to find smaller things and the different views (see later for different orientations of monomer with a 392 px box)

For the monomer, which I had orientation bias so I used Topaz and recovered some rarer views which helped.

Part of the problem in general is, the tetramer is a dimer of dimer, so they share particular views and its likely very hard to distinguish if the view is solely a dimer, or a tetramer at the same orientation.

Here is a micrograph:

blue = tetramer (edge view)
red = dimer (common view with top view of tetramer)
orange = monomer
Alot of the junk is the pseudofilament but here is a especially clear view. It is the most abundant complex in the sample.

Its not what I would call a true filament, since it just a long ropey thing of tetramers that arent always straight, but if you reconstruct a subsection they align very well. This is so far the best structure (3.5 A with 10% of micrographs being utilized). Filament tracer doesnt work since they are squiggly.

3D classification:
As for ab-initio and looking forward to 3d classification, this ab initio run is going on, with a particle stack that I was cleaning for the dimer. The 7th class is the dimer, first view is top down, which is the same as top down tetramer (2,3,4). 1 fits what I would expect the monomer to be. 5 and 6 look to me to be junk. I think some of the problem with similar classes is that the box is currently too small (since it was primarily looking for dimer) that the tetramer is being cut off depending on centering, so the are sorted as different classes.

My general thoughts:
The pseudofilament should be straightforward to pull out since its so much larger than everythign else.
Dimer and Tetramer will be hard since they share views that I dont think can be distinguised, another confouding issue is that, say for the dimer if the tetramer view contaminates that ab initio then you get the second dimer emergence at low thresholds. I would think this would be able to purified out with enough classification runs. I think I am going to try your 3d classification approach with monomer, dimer and tetramer mixed and see if that helps.
Monomer, with my template based picking/small box size 2D classifying there is an orientation problem (homogenous refinement to ~4.5 A, 90K particles)

Thanks for your help and really open for any suggestions!!

mjmcleod64 · October 19, 2023, 6:21pm

Also,

For 3D classification, do you have recommendations for settings besides increase class #?

rwaldo · October 19, 2023, 6:41pm

Your 2D classes look very nice, as do your micrographs!

Since the monomer, dimer, and tetramer seem to separate fairly well from the filamentous oligermers, I’ll just consider those here.

I think that extracting everything with the tetramer-sized box will give you the best results here. You can come back later and use jobs that are more suited to the task of separating out oligomers (3D Classification would likely work quite well here) once you’ve got a very clean, well-aligned stack. So I think proceeding with the combined ab initio / heterogeneous refinement strategy is the right first step.

I would be wary of that monomer map — it looks mostly like noise to me, but of course I don’t know your target as well as you do!

As for your last question, I assume you’re referring to ab initio. For now I think I’d leave everything except the class number as default and see how it does.

Good luck!

mjmcleod64 · October 19, 2023, 6:43pm

Thanks, thats the general sense I have no with moving forward.

I definetly think the monomer map is noisey. I am still trying to sort out a way to clean that as well and maybe the abinit - hetero workflow will help.

Thanks!

mjmcleod64 · October 24, 2023, 6:25pm

Hi,
Im starting 3D cleaning of filament structure. Im choosing enough ab initio classes that i get at least 1 junk class (~7-8) with a 0 class similarity. Then using all particles in hetero-refine. Then I make new ab-init same as before only using particles of good classes from hetero.

Is there anything else to consider? Is the class similarity score fine or should that be tuned?
Matt

rwaldo · October 24, 2023, 6:33pm

When you say filament structure, you mean the particles we’ve been discussing with all the different oligomers picked at once?

I would follow this protocol to start:

only use ab initio to filter out noise from your particle stack.
once the noise is gone and you have a starting volume for each oligomer, use heterogeneous refinement and not ab initio
once you are confident that the particles are properly classified, use homogeneous or non-uniform refinement to get high-resolution maps of each state.

The class similarity may need to be tuned a bit. I would start with it at the default, which I believe is 0.1. If you only ever see one or two good classes, you could try increasing it. But the oligomerization states are different enough that a low value will probably be fine.

mjmcleod64 · October 24, 2023, 7:02pm

The filament is easyto distinguish so there are not too many other oligomers present.

I will work through ab-inits. Thanks!