Case study: End-to-end processing of a ligand-bound GPCR (EMPIAR-10853)

Hi CryoSPARC users! We are excited to share with you a new end-to-end data processing case study of an active GPCR with a peptide ligand bound that you can follow along with!

Here we guide you all the way from downloading an EMPIAR dataset through to improving the quality of the particle stack, and separating different ligand poses, generating maps ready for model-building. Our final sharpened maps, half maps and masks are downloadable to allow comparison with your results.

This case study is of a membrane protein target at 140 kDa, and within the complete processing pipeline it includes ideas on how you might approach:

  • Junk regions in micrographs
  • Contamination of a particle stack with low quality or damaged particles
  • Apparent late frame high frequency data in RBMC dose weighting
  • Over-fitting in local refinement
  • Separation and assessment of different peptide ligand binding poses

We hope this detailed guide is a useful resource for users old and new! If there are other types of targets, or particular single particle cryo-EM challenges that you would like to see a case study on in the future, let us know!

9 Likes

Hi cryoSPARC team,

Thank you so much for posting this informative case study. I have a question about the strategy for removing junk particles.

In the case study, you first take the “best” particles from a Heterogeneous Refinement job, then run a new Ab-Initio Reconstruction with those particles to generate multiple volumes, and finally use those ab initio volumes as references in the next Heterogeneous Refinement (again with the same “best” particles).

Could you please explain why this approach is preferred over simply taking the best volume from the previous Heterogeneous Refinement as the “good” reference, and combining it with one or more “junk” volumes (also generated from the same best-particle set) as references for the next Heterogeneous Refinement job?

Thank you,

Junjie

Hi @Jianming !

Thanks for your question, you are clearly paying attention to critical decision-making steps and that’s great!

In this case study, Ab-Initio volumes are regenerated after Hetero Refine 1, once the worst of the junk has already been removed. This is performed in order to generate junk volumes that are more similar to the target, such as free G-protein, or damaged GPCR that might be present. In the case study, all of the volumes from Ab-Initio 3 are used as volume input to Hetero Refine 2.

If

If we look closely at Figure 7, the pink volume in Hetero Refine 1 actually looks better than the pink volume in Ab-Inito 3, therefore, the “good” volume from Hetero Refine 1 could be used for downstream Hetero Refinements instead of the best volume from Ab Initio 3, if preferred.

We expect the Hetero Refine results to be very similar with either volume input in this case, but it is often useful to explore different options, such as trying the Hetero Refine volume, to see which gives the best result.

Thank you very much. Honestly, I tried both approaches: using the best volume from a previous hetero-refine job and using the best volume from an ab-initio job. I found that the best volume from hetero-refine consistently removes junk particles more efficiently. This made me think that the ab-initio volume I generated might not be as good as the best volume from hetero-refine.
Therefore, my question is: how can I generate a high-quality best volume for a GPCR complex using an ab-initio job?

In addition, if there is already a 3.5 Ă… map of the GPCR complex generated from part of the dataset, can I use this map as an input reference for a hetero-refine job when processing the full dataset to remove junk particles? Someone pointed out that this could introduce model bias, but based on my understanding, hetero-refine low-pass filters the input reference to 20 Ă… when the initial resolution is set to 20 Ă…. Therefore, model bias should be avoided.

Could you let me know if my understanding is correct when you have a chance?

Very appreciated,

Junjie

Hi @Jianming !

Thanks for getting back to us with the results of your tests - that is very good to know!

If you would like to generate Ab-Initio Volumes that are of a higher resolution than the default settings, you could try adjusting the Initial resolution to 20-12 Ă… and Maximum resolution of 10-5 Ă…, although note that these jobs may take longer to run than with default resolutions. We have recently uploaded a preprint that describes automation of data processing pipelines for GPCRs using CryoSPARC workflows. This uses a Maximum resolution of 10 Ă… in Ab-Initio, and an Initial lowpass of 12 Ă… for the subsequent Heterogeneous Refinement, and these settings worked pretty well across the board. We find using higher than default resolutions for Ab-Initio to be most beneficial for inactive GPCRs or those without a Fiducial such as a Fab fragment bound. You might find some other useful tips in the preprint for your processing, if you have not yet had a chance to take a look at it!

It is good that you are considering the possibility of reference bias in your processing work, it’s something we should always keep in mind. If you already have a 3.5 Å map of the GPCR complex generated from part of the dataset, this should be safe to use as an input for Heterogeneous Refinement, because as you mentioned, the reference is lowpass filtered to 20 Å by default at the start of the job. Generally, we expect that any map features in the output volumes that are at a higher resolution than the lowpass (in this case 20 Å), will be genuinely coming from your data, rather than the reference. You may find it interesting to know that in the automated pipeline in the link above, the initial volumes used for early Heterogeneous Refinement jobs come from a different dataset entirely and still work fine!