Hi everyone,
Through a search on the forum, I did not find a similar problem posted, but at the same time, I would suspect that my situation is not unique. If there is anything similar, just point me there.
I am processing a dataset of a protein that is highly flexible. I never manage to obtain 2D classes that show clear high-resolution features, but that is expected. What is strange is that 2D classification jobs keep telling me that the resolution of most classes is 3Å and some classes are 7Å. Obviously, the 3Å are not 3Å and there are no classes in between 3 and 7Å resolution.
This is at least suspicious. Can anyone help me to understand what is going on here?
Also: If anyone has a nice strategy that allows for better 2D classification of large asymmetric, approximately spherical but highly flexible proteins, suggestions are accepted
Thanks!
resolution estimates for 2D classes are unreliable - I would pay very little attention to them and focus on what you can see. In terms of suggestions - try increasing batchsize & number of iterations (and number of final full iterations), and try switching off Force/max
1 Like
Hi @olibclarke thanks for the suggestions!
Indeed, I never pay much attention to the resolution estimates, but what caught my attention this time was the discrepancy and the insisting behavior for the 2D classification job to estimate either 3 or 7Å and nothing in between. So I thought this could be a symptom of something else.
Regarding your other suggestions, I have been explored those to a large extent and I don’t get better 2D class averages, nor something obvious that is junk that should be excluded. I am thinking this is just not a protein to study by EM (at least not this dataset).
The suspicious resolution is a little worrisome, but I can’t speak to that.
I can speak to classifying a highly flexible (60 A worth of motion), large, asymmetric protein. It is difficult but I say not impossible (if you have enough particles). The long story short is- play with it endlessly until you find what works.
My method has been to do a blob pick, 2D classify that a few times at different windows and different inner/outer masks, while only removing the noisiest particles. Then I take the best-looking 2D classes to use for template picking.
When I go from there I always err on the side of including things that look noisy but miiiight have a few good particles in them. I do several rounds of 2D classification again with different masks etc, then attempt a 3D ab-initio job with 2 or 3 classes. I take those classes separately and AGAIN do a round of 2D classification and remove only the particles that look the noisiest. Then a 3D ab-initio with 1 class.
From there I have a better understanding of what size my molecule is. If I think I need to increase or decrease my 2D mask or change something I go back to start and use those parameters instead.
My molecule in particular is about 90x130A, but when I mask I find that it works best for me to have an inner and outer 2D mask of 180 and ˜320 A, respectively. It is surprising to me how much the masking can affect the ability of the algorithm to find and center my particles.
I can discuss further if you’d like. It’s not easy but possible. I have a 3.0A structure of one of the conformations for all my efforts.
Hi @jenchem!
I feel like your sharing of experience is highly relevant not only because my protein seems to have very similar flexibility, but also because I acknowledge that playing with window and mask sizes make a difference.
I have almost reached similar conclusions, but after pausing data processing for a few weeks I totally forgot that I was seeing considerable differences when varying window or mask sizes, although I was never that extreme in varying my size parameters as you have shared. I will pursue data processing with further variation of these parameters.
One thing that let me wonder: when you wrote “I take those classes separately and AGAIN do a round of 2D classification and remove only the particles that look the noisiest. Then a 3D ab-initio with 1 class.” Did you mean that you do 3D ab-initio with 1 class for each of the 2 or 3 independent 2D classifications from previous 3D classes, or is it that you join all the particles from the 2 or 3 2D classification jobs for the last 3D ab-initio. I would believe more in the first scenario. Could you share your experience in downstream processing to handle the 3D heterogeneity?
Grateful!
Andre
I take each of the 3D ab initio models and their particles and do 2D-classification separately for each group of particles. Then I combine all of my picked particles from all of those 2D classification jobs from all of the initial ab initio models, and do a single 3D ab initio job.
The reasoning for the first step is to break apart the heterogeneity a bit to more easily see what are real particles and what is likely junk, (assuming the 3D ab initio job separated particles into realistic models and isn’t creating noise-models).
The reasoning for the second step (recombining everything) is to create an ab initio model that I can then attempt to use for 3DVA. 3DVA in itself is a huge pain to optimize, but it is such a useful tool, even if you’re not using it to visualize movement. Once I figured out the 3DVA parameters that worked for my molecule and didn’t just refine to noise, I was able to cluster my particles along 3 components of variability and pull out the clusters that were close together at one end of the highest-ranking component of variability (making sure I had enough particles to make a decent model). Then I took those particles, did ANOTHER 3D ab initio and NU-refinement job, and voila, out popped a beautiful 3.0A map with great connectivity.
My first goal with this data processing was to simply be able to dock residues into the map in order to make point mutations that may affect binding or catalytic activity. Figuring out how to parse all of the flexibility is proving much more difficult because the majority of my molecules are in a single conformation, so my bottleneck is getting enough particles in the more flexible orientations to be able to resolve them to a decent enough resolution to confidently assign domains that have moved.
The other bottleneck is having a 200-residue intrinsically-disordered-region (IDR). This not only will be next to impossible to resolve, but it adds noise to my micrographs that convolute the ability of the 2D and 3D classification/refinement jobs to refine to the rest of the molecule. I got around this by creating a construct that had this 200-residue region removed. Luckily enough it didn’t affect secondary or tertiary structure of the molecule to my knowledge, and it helped me understand which parameters I should use for processing my WT dataset.
This hasn’t been an easy process, but it’s been very educational!
Jennifer
1 Like
One other thing I think I forgot to mention. Increasing the Recenter Mask Threshold in 2D classification from default to ˜0.8 has really helped me. It may not make a difference in your case but in my case the center of mass has no density and so I have a real difficulty in centering during picking/2D-classification.
I’ve ended up starting 2D classification from picks using a 0.3 threshold, increasing to 0.8 on a second round, and then for my final round of 2D classification I don’t recenter at all. I don’t know if this is the optimal way to do it but I think idoing this and only removing what is definitely not a real 2D class helps remove noise/junk bit-by-bit so I don’t throw out the good particles with the bad ones.
Good luck!
Jennifer
Dear Jennifer (@jenchem),
Your share of experience is so much appreciated and deeply thank you for the time and heart you put into writing these messages and replying to my request. I am also sure that this thread will be a good resource for many, after your sharing.
Above all, I have to say, you motivated me to not lose hope in this project and dataset. Insisting on cleaning the set of particles with varying 2D classification parameters, when I already thought the set was clean, is already helping me to move in a good direction. The ab-initio 3D reconstruction already looks reasonable and matches better my expectations, so now I just need to sort out the variability of conformations and flexibility of this dataset. At the moment I am reduced to 100k particles, so I am afraid this won’t be enough to sort out much of the heterogeneity and still get a satisfying resolution such as the 3Å you obtained. I would be already happy to achieve 4Å, to be honest. How many particles did you have at this point?
(Sooner or later I might change the topic of this thread to match better the pleasant direction it took )
Thanks for today!
I do have a structure of the main conformation at 3.03A, with 197k particles. But it is missing a lot of flexible density. So I ended collecting a lot more data and trying again.
My very best map is not much different as far as overall resolution goes, at 3.02A, but it has 477k particles and almost the entire protein is there (over 90% of about 3500 residues).
I have two maps of another conformation at 3.71A with 83k particles and 3.77A with 87k particles. Half of the molecule is fine enough, but again there’s so much flexibility that the other half of the molecule isn’t resolved. It might be different for you if you only have outlying domains which are flexible and not the entire body of the protein making a huge hinging shift.
Depending on the inherent nature of the flexibility of your protein you might not get much more past 3.5A with 100k particles. But sometimes taking out a chunk of ˜10k flexible particles can really help, and 3DVA and cluster-pruning might get you there. Are you able to collect more data and combine datasets?
Best of luck,
Jennifer