I am trying to identify and solve the structure of a protein complexed to Ribosome-multipass translocon. Our biochemical assays prove that this protein should be a part of this super-complex but occupancy would be ~2-5%. I generate this complex via in-vitro translation followed by detergent solubilization and pull via nascent chain and I use the elution to put on grid and collect dataset of roughly ~4.5k and had to discard almost half during curation process. I tried creating focused mask around region above detergent micelle where I expect my protein’s Lumenal domain to protrude out from detergent micelle. So far I could just assign density to multipass translocon components as observed by previous studies (Smalinskaite et al, 2022,Nature) and TRAP which is almost always seen in such preparations because it is abundant. My goal is to separate this rare class which would have my protein of interest (membrane protein-324kDa). Can anyone suggest me all possibilities in terms of Data processing strategies via cryosparc or Relion, that I could try to separate this rare class?
Hello @mayabisw ! Welcome to the CryoSPARC Discussion Forum, and thanks for your question!
It can be quite a challenge to find very rare states in cryo-EM data processing. If your map shows expected features of RNA and protein at higher contour thresholds, then you could start off trying something like this:
Use 3DVA to see if there is preliminary evidence in any component frames for your target protein density.
Run 3DVA on ~100-200k particles, with a very generous, soft mask that covers the entire ribosome and the expected region where your protein of interest might be, and run this with a pretty low resolution, maybe 10-12 Å and 5-6 components. A low resolution might be best here because the luminal side density is poorly defined in the parent map, and higher resolution filters might be looking largely at noise.
Run 3DV Display in simple mode with a filter resolution of 8-10 Å, and apply Downsampling and/or box cropping as appropriate to reduce the box size and speed up file download and visualise the series in ChimeraX.
Ribosomes can have complicated heterogeneity, so if none of the output modes show density that resembles your protein of interest, then repeat steps 1-2 with a generous, soft mask around the micelle and luminal domain region.
Run 3D Classification with custom settings. When using 3D Classification, it might be a good idea to do this at pretty low resolution (similar to the 3DVA test above) and set the Class similarity to be quite low, such as 0.1. You can use a mask around the region where you expect to see your target protein, but perhaps play around with including or excluding the micelle region. If you are using CryoSPARC ≤v4.7, then try classifying into ~50 classes. This is because in some cases, the classification can tend towards equal distribution of particles in the classes, and you expect a population of ~2% to have your target bound. If you are using CryoSPARC v5 then you can try using fewer classes and enable Use latent mixing coefficients.
We show an example of classifying a rare state in this case study, and Section 8 in particular, may provide some further ideas for how you might proceed.
I hope there is something in those suggestions to help you get started!
Another idea would be to give cryoDRGN a shot. In my hands, it was frequently able to identify rare classes in a single round of training once I found the right downsampling parameter and removed the junk. Just keep in mind that the output volumes from cryoDRGN are created by the decoder. You will have to select the cluster, export the particles, and perform a reconstruction or refinement of the particle subset ( GitHub - ml-struct-bio/cryodrgn: Neural networks for cryo-EM reconstruction ).
I also had a good experience with RELION 3DC without image alignment. You will have to create a mask around the region of interest (where you think your protein is bound) and then test: 1. Different class numbers, 2. tune the t value, 3. test to limit the resolution in the E-step, 4. downsample your particles 5. try different combinations of 1-4. Important for my datasets was typically that I repeated the refinement before 3DC in RELION, and sometimes I also had to re-extract the particles in RELION.