Distinguish heterogeneous particles with only a few amino acid differences

I’m working on a gene that has two copies in the genome. The two copies of the protein differ by only two amino acids (N-E, E-A). And this protein is one of the components of a C4-symmetric protein complex. I have a question are both of these proteins involved in the formation of protein complexes? If they are all involved in the assembly, are they not equally involved? So I purified the protein complex and acquire 2.55A local-resolution structure by NU-Refine with C4 symmetry. I then used symmetry-expand and created a mask of only 3 amino acids (one of these is different amino acid site) to run the 3D classification, and the results of the 3D classification are then run local-Refine. In my many attempts (trying a wide variety of mask sizes) there are now only maps of E-A difference site that can be distinguished in coot when using hard classification. But the particle assignments in 3D classification always divides all classes of particles into the same number. I thought that in the future, if I can separate different maps from both sites, I will take the intersection of particles as a kind of protein. I always feel unreasonable doing this, is there another way to solve this problem better? In the picture are the results of my E-A sites, which I have divided into four classes. Two of which I think are E and A. Others I think can be discarded. Another picture shows the 3D classification, with the same number of particles in each classes.

This is a difficult and interesting problem to solve with particles/refinement alone, especially considering radiation damage will destroy your carboxyl groups to some extent. That being said, your densities in the top panel look reasonable if the left side is AE (but the bottom right panel is too cropped to see the residue and you probably shouldn’t trust any 3D classification scheme that breaks your peptide bond). Is the carboxyl group in the region of interest in your 3d-separated results better than carboxyl groups in other regions of the protein? If so, it may be an artificial enhancement.

On the biology side, you could get the structure of recombinantly expressed just-NE or just-EA sequence protein. To test your 3D classification method, you could generate a recombinant concatenated protein in which 2 or 4 chains are fused together to force a true C2 or C4 symmetric homodimer with known EA-site composition and get the structure + 3D classify. You could generate various combinations, such as chainA(NE)_chainB(EA), chainA(NE)_chainB(NE), chainA(EA)_chainB(EA), and chanA(EA)_chainB(NE) and get the structure to test the hypothesis that they still assemble and you can separate particles based on the site of interest. Perhaps the easiest way to start would be heterologous expression (+ EM structure) of just-NE v just-EA containing sequence. Test the 3D classification on the site of interest again and check that the same type of density seen above doesn’t come up. If it does, you can know it is artificial because you have defined components used to generate the density map(s).

Using your current dataset alone, you could test your 3D classification method on other sites where there should be no compositional heterogeneity. Perhaps choosing sites with similar size side chains and local resolution.

Thank you very much for your advice, it helped me a lot. I’m going to check the carboxyl groups at the amino acid positions I’m interested in. I had been worried about the possibility of artificial enhancement. Thank you for your suggestion.