Identification of Post-translational Modifications of Proteins

Reed Harris, Genentech, Inc.

Topics covered by this workshop included several (but not all) types of covalent protein modification, including some co-translational, post-translational and spontaneous modifications. The workshop was organized primarily for the benefit of sequencing labs with access to amino acid analysis and mass spectrometry. No general technique exists for detecting and characterizing all types of protein modifications, except perhaps for the increasing applications of mass spectrometry to this field. A number of excellent reference books are available (1-4). An overview was given by Reed Harris (Genentech), with detailed reports on N-glycosylation and phosphorylation given by Betty Yan (Eli Lilly) and John Stults (Genentech), respectively.


N-linked oligosaccharides fall into several major types (oligomannose, complex, hybrid, sulfated), all of which have (Man)3-GlcNAc-GlcNAc-cores attached via the amide nitrogen of Asn residues that fall within -Asn-Xaa-Thr/Ser- sequences (where Xaa is not Pro). Glycosylation at an -Asn-Xaa-Cys- site has been reported for coagulation protein C. N-linked sites are often indirectly assigned by the appearance of a "blank" cycle during sequencing. Positive identification can be made after release of the oligosaccharide by PNGase F, which converts the glycosylated Asn to Asp. After PNGase F release, N-linked oligosaccharides can be purified using Bio-Gel P-6 chromatography, with the oligosaccharide pool subjected to preparative high pH anion exchange chromatography (HPAEC) (5). Certain oligosaccharide isomers can be resolved using HPAEC. Fucose residues will shift elution positions earlier in the HPAEC chromatogram, while additional sialic acid residues will increase the retention time. Concurrent treatment of glycoproteins whose oligosaccharide structures are known (e.g., bovine fetuin, a-l acid glycoprotein, ovalbumin, RNAse B, transferrin) can facilitate assignment of the oligosaccharide peaks. The collected oligosaccharides can be characterized by a combination of compositional and methylation linkage analyses (6), with anomeric configurations assigned by NMR spectroscopy (7).

O-linked glycosylation sites are less predictable. The "mucin-type" O-linked structures have GalNAc at the reducing end (at the linkage between the polypeptide and glycan). Sialyl Gal-GalNAc- structures are common. These glycans are attached to Thr/Ser residues that are usually part of beta-turns near Pro residues. Characterization of O-linked oligosaccharides can proceed as for N-linked, except that the glycan is usually released by hydrazinolysis [such as the Oxford GlycoSystems GlycoPrep 1000 (8)]. Thr/Ser residues that are modified by GlcNAc (O-GlcNAc) have been identified by mass spectrometric methods (9) or by the incorporation of UDP-[3H]-galactose (10). Two types of O-linked modifications are found only in EGF domains: xylosylglucose glycans attached to Ser residues within CXSPC sequences, and O-fucosyl moieties attached to Thr/Ser residues within CXXGGT/SC sequences ( I 1). In general, O-glycosylated residues are indicated during sequencing by a low recovery (or no recovery) of PTH-Thr/Ser during N-terminal sequence analysis (the O-fucosyl types are an exception). O-glycosylated residues will beta-eliminate under alkaline conditions; the modified Thr/Ser residue can be reduced to a-aminobutyrate or alanine (12). (Phosphorylated Thr/Ser also beta-eliminate, however.) Glycosaminoglycans of proteoglycans are attached to Ser residues within -Ser-Gly- sequences.

Hydroxyamino acids.

Beta-Hydroxyaspartate (Hya) and beta-hydroxyasparagine (Hyn) are only found within CXN/DXXXXTY/FXCXC sequences of EGF domains. Hya can be found in acid hydrolysates (elutes ahead of Asp on a Beckman 6300); PTH-Hya and PTH-Hyn elute in the HPLC injection artifact (before PTH-Asp) in ABI PTH-identification HPLC systems. 5-Hydroxylysine (Hyl) and 3/4-hydroxyproline (3Hyp, 4Hyp) are essential components of collagens and can have Gal- and/or Glc-Gal- glycans attached. Hyl and Hyp are found in -Xaa-Hyl/4Hyp-Gly- sequences, with the Xaa position sometimes occupied by 3Hyp. PTH-Hyl elutes between PTH-Val and DPTU in the ABI PTH-analyzer but co-elutes with DPTU in the HP PTH-analyzer. PTH-Hyp appears as 2 peaks, before and after PTH-Ala, during sequencing. Hyl and Hyp co-elute with His and Thr, respectively on a Beckman 6300 using NaCitrate buffers, but can be resolved by using LiCitrate buffers.

Fatty acid modifications.

Glycosylphosphatidylinositol (GPI) structures are found at the C-terminus of several membrane proteins (13). Ethanolamine phosphoglycerol attached to Glu residues generate "blank" sequencer cycles. Some membrane-spanning proteins have cytoplasmic Cys (or possibly Ser) residues that are acylated by palmitate or stearate. N-myristoylation can occur on proteins with N-terminal Gly residues (14, 15) or on the epsilon-amino side chain of Lys (16). Acyl groups can be identified by GC, GC-MS analysis or by RP-HPLC aRer acid hydrolysis, extraction with ether or chloroform, and methylation. S- or O-acyl groups will be removed by base (0.1 M methanolic KOH, 90 min, 23deg.C) or hydroxylamine ( I M NH2OH, 20 h, pH 7, 23deg.C) treatment, while N-acyl groups are base- and hydroxylamine-stable and cause "blocked" N-termini. Lipoic acid groups have also been found on Lys in several 2-oxo-acid dehydrogenase complex proteins. Isoprenylation of Cys residues has been reported for a number of Ras-type proteins (17). Geranylgeranyl (C20) or farnesyl (C15) isoprenoids are added to Cys side chains at -Cys-Aaa-Aaa-Xaa C-termini, then the Aaa-Aaa-Xaa tripeptide is removed, followed by methylation of the COOH.

Cross-Linking Modifications.

Cystine (-CH2-S-S-CH2-) disulfides spontaneously form under oxidizing conditions; lanthionine (-CH2-S-CH2-) has also been found, often as an artifact of peptide synthesis. Intramolecular thioester linkages between Cys and Gln residues have been found in complement C3 and C4, and in a-2-macroglobulin. Epsilon-gamma-glutamyl)lysine cross-links are catalyzed by transglutaminases such as factor XIIIa. Ubiquitin C-terminal COOH are similarly linked to Lys epsilon-amino groups.

Additional modifications.

Radiolabelled phosphopeptides can be identified in peptide maps after incorporation of 32p, with the phosphorylated Thr, Ser or Tyr identified by MS/MS techniques (+80 amu) (18, 19). Phosphorylated peptides are selectively retained (under acidic conditions) by Fe3'-chelating iminodiacetyl-Sepharose; the phosphopeptides are subsequently eluted at alkaline pH (20). Ser(P) can be converted to a stable derivative (S-ethylcysteine) that can be directly observed during sequence analysis (21). Sulphotyrosine residues have the same mass as phosphotyrosine; the sulfate is more acid-labile and can be selectively liberated by hydrolysis in lN HCl at 100deg.C for 4 min. (as suggested by Agnes Henschen). Methylated Lys, His and Arg can be identified by LiCitrate-buffered amino acid analysis. PTH-methylhistidine elutes between PTH-Ala and PTH-Arg, while PTH-methylarginine elutes between PTH-Tyr and PTH-Pro. The PTH-derivative of gamma-N-methyl-Asp (from phycobiliproteins) co-elutes with PTH-Ser (22). o-N-methyl-Gln has been found on some bacterial ribosomal proteins. Iodinated (mono- and di-iodo) Tyr are modified by a thyroid peroxidase. Cysteine can be modified by glutathione. Selenocysteine (replacement of sulfur with selenium) and selenomethionine have been reported. Gamma-carboxyglutamic acid (Gla) is found, often in tandem, in N-terminal "Gla" domains of certain vitamin K-dependent coagulation proteins, bone and shark matrix proteins, and osteocalcin. Gla is acid-labile, but can be recovered after base hydrolysis (23). PTH-Gla residues can be identified after methylatioll (24).

N-terminal modifications.

N-acetyl "blocked" N-termini of eukaryotic proteins are common; the N-terminal residues are often Ala, Ser, Met, Gly or Thr. N-acetyl residues can be enzymatically removed from peptides (25). N-methylation of mammalian ribosomal proteins usually occurs at Ala/Pro/Phe-Pro-Lys-N-termini. N-formyl Met is usually processed by a deformylase; the exception is bee mellitin. Glutamine and S-carboxy-methlycysteine can form cyclic "blocked" N-terminal residues; the former can be removed by pyroglutamate aminopeptidase.

C-terminal modifications.

C-Terminal amidation is common in peptide hormones. The amide is contributed by Gly from a precursor C-terminal sequence of -XGXX. CpY will release C-terminal amides, while CpA and CpB will not. GPI anchors, methylation (associated with isoprenylation) and ADP ribosylated C-terminal Lys can also be present.

Peculiar protein modifications (biotech headaches).

Norleucine can be incorporated at Met positions in E. coli-derived proteins. This results from two errors: 1) Leu biosynthesis produces Nle, and, 2) tRNAMC' is charged with Nle instead of Met. Nle is easy to detect by AAA (especially in LiCitrate buffer systems) or by N-terminal sequence analysis. Methionine sulfoxide [Met(O)] is detected in peptides by MS (+ 16 amu) or by its resistance to CNBr digestion. Met(O) is reduced back to Met in ABI sequencers, but some appears (as three peaks) in the HP G1000A sequencer. Asparagine deamidation and aspartate isomerization proceed through a succinimide intermediate that is usually unstable; stable succinimides can be detected by sequencing after alkaline hydroxylamine cleavage. Isoaspartate is often formed at succinimide sites; isoAsp stops the Edman degradation (the sequence disappears at the isoAsp). Spontaneous diketopiperazine degradation may occur when the second residue of a protein is Pro (in XPY- sequences, the XP dipeptide is cleaved off, leaving a Y- N-terminus). Mammalian cells can secrete carboxypeptidase B, so the absence of expected C-terminal Lys or Arg residues (based on cDNA-derived sequences) is common.

Sequencing "Blind Spots".

Many posttranslational modifications have been detected because of the appearance of a "blank" cycle or an unusual PTH-amino acid during sequencing. Several types of modification are not detected in this fashion. These include phosphorylated Thr/Ser [e.g. ABRF-92SEQ, (26)], Met(O), O-fucosyl modifications, isoAsp, acylated Cys, and Gla (which gives some Glu).

A General Approach.

Mass spectrometry should be performed on protein/peptide samples after the primary sequence is determined; this will confirm the sequence assignment, and, should a discrepancy exist, the mass of the covalent modification is already available (27). A comparison of the observed sequence with the known consensus sequences should help to determine the type(s) of modification(s) (28). Observed HexNAc (m/z = 204) or Hex-HexNAc (m/z = 366) ions in an LC-MS experiment may indicate the presence of a glycopeptide (29). The absence of an expected peptide mass or the appearance of an unexpected peptide mass in a peptide digestion/LC-MS experiment may suggest a site of peptide modification, especially if the investigator has ruled out the possibility of unexpected or incomplete digestion; careful scrutiny is oRen required to identify sites with partial or heterogeneous modifications.


1. Wold, F. (1981) Annu. Rev. Biochem. 50, 783-814.

2. Wold, F. and Moldave, K. (1984) in Methods Enzymol 106 and 107.

3. Burlingame, A.L. and McCloskey, J. (1990) in Biological Mass Spectrometry. Elsevier.

4. Seifter, S. and Englard, S. (1990) in Methods Enzymol 182, pp. 626-646.

5. Townsend, R.R. et al., (1989) Anal. Biochem. 182, 1-8.

6. Waeghe, T.J., Darvill, A.G., McNeil, M., and Albersheim, P. (1983) Carbohydr Res. 123, 281-304.

7. Van Halbeek, H. (1993) in Methods Enzymol 230, in press.

8. Merry, A.H., Bruce, J., Bigge, C., and lonnides, A. (1992) Biochem. Soc. Trans. 20, 91s.

9. Reason, A.J., Morris, H.R., Panico, M., Marais, R., Treisman, R.H., Haltiwanger, R Hart, G.W., Kelly, W.G., and Dell, A . (1992) J Biol Chem. 267, 16911 - 16921.

10. Kelly, W.G., Dahmus, M.E., and Hart, G.W. (1993) J Biol Chem. 268, 10416- 10424.

11. Harris, R.J., and Spellman, M.W. (1993) Glycobiology 3, 219-224.

12. Anderson, B., Seno, N., Sampson, P., Riley, J.G., Hoffman, P., Meyer, K. (1964) J Biol Chem. 239, 2716-2717.

13. Ferguson, M.A.J. (1992) Biochem. Soc. Trans. 20, 243-256.

14. Sefton, B.M., and Buss, J.E. (1987) J. Cell Biol 104, 1449-1453.

15. Grand, R.J.A. (1989) Biochem. J 258, 625-638.

16. Stevenson, F.T., Bursten, S.L., Fanton, C., Locksley, R.M., and Lovett, D.H. (1993) Proc. Natl Acad Sci. 90, 7245-7249.

17. Clarke, S. (1992) Annu. Rev. Biochem. 61, 355-386.

18. Palczewski, K., Buczylko, J., Van Hooser, P., Carr, S.A., Hiddleston, M.J., and Crabb, J.W. (1992) J Biol Chem. 267, 18991 - ] 8998.

19. Hou, J., McKeehan, K., Kan, M., Carr, S.A., Huddleston, M.J., Crabb, J.W., and McKeehan, W.L. (1993) Protein Sci. 2, 86-92.

20. Nuwaysir, L.M., and Stults, J.T. (1993) J Am. Soc. Mass Spectrom. 4, 662-669.

21. Meyer, H.E., Hoffman, Posorske, E., Korte, H. and Heilmeyer, Jr., L.M.G. (1986) FEBS Lett. 204, 61 -66.

22. Klotz, A.V., and Glazer, A.N. (1987) J Biol Chem. 262, 17350-17355.

23. Hauschka, P.V. (1977) Anal Biochem. 80, 212-223.

24. Cairns, J.R., Williamson, M.K., and Price, P.A. (1991) Anal Biochem. 199, 93-97.

25. Krishna, R.G . (1992) in Techniques in Protein Chemistry 111, pp. 77-84, Academic Press (San Diego).

26. Mische, S.M., Yuksel, K.U., Mende-Mueller, L.M., Matsudaira, P., Crimmins, D.L., and Andrews, P.C. (1993) in Techniques in Protein Chemistry IV (Angeletti, R.H., ed.), pp. 453-461, Academic Press (San Diego).

27. Krishna, R.G., and Wold, F. (1993) in Methods in Protein Sequence Analysis (Imahori, K., and Sakiyama, F., Eds.) pp 167- 171. Plenum Press (New York).

28. Aitken, A. (1990) in Identification of Protein Consensus Sequences. Ellis Horwood (London).

29. Carr, S.A., Huddleston, M.J., and Bean, M.F. (1993) Protein Sci. 2, 183-196.

Return to the The ABRF Home Page

Created: 11th September 1995
Last modified: 11th September 1995