Carboxyl-Terminal Sequencing Methods


John E. Shively

Beckman Research Institute of the City of Hope

A routine method for carboxyl-terminal sequence analysis of proteins should allow characterization of intact proteins for the purpose of identifying proteolytic processing at the carboxyl -terminus and for producing oligonucleotide probes for cloning cDNAs. The goals of the method are high sensitivity (pmol range), quantitation, ability to identify all 20 common amino acids, multiple cycles, and applicability to all or most proteins. The majority of methods developed to date fall short in meeting all these criteria, but good progress has been made, especially in the area of chemical methods for the analysis of proteins. In addition, good enzymatic methods coupled to mass spectrometry are available for the analysis of peptides and some small proteins (less than 20 kDa). This workshop allowed direct comparison of two related chemical methods, one already commercially available (Hewlett-Packard) and the other under evaluation at beta-sites (Applied Biosystems). Both chemical methods allow the analysis of 0.1-1.0 nmol of sample for 1-5 cycles and are compatible with gel-blot methods. Both methods start by generating peptidyl thiohydantoin derivatives but differ in subsequent steps: the HP chemistry liberates the thiohydantoin-amino acids by treatment with potassium trimethylsilanolate, but with the ABI method the thiohydantoin ring is alkylated with bromomethylnapthalene and then cleaved with isothiocyanate. Both methods allow assignment of all 20 common amino acids except proline and cysteine (but see below).

John E. Shively, "Enzymatic Methods"

Release of amino acids from samples digested with carboxypeptidases is variable and dependent on the sample, the amino acid to be released, and the digestion time. The most popular carboxypeptidases are Y, C, and P, or a combination of Y and C, chosen for their broad amino acid specificities. To achieve reliable analysis, many digestion time points must be evaluated, ranging from minutes to hours. In some cases equivalent analyses can be achieved by varying carboxypeptidase concentration and analyzing fewer time points. For peptide samples, the released amino acids can be analyzed by conventional amino acid analysis or the shortened peptide can be analyzed by mass spectrometry. Mass analysis methods are now preferred due to their speed.

Two mass spectrometric methods were reviewed in this presentation. The first was electrospray ionization (ESI) MS, a method allowing the direct introduction of liquids into the mass spectrometer. In this method, a digested sample is slowly infused allowing collection of all time points in a single unmonitored run. Rosnack and Stroh (Rapid Comm. Mass. Spectrom. 6, 637-640, 1992) have used this method to analyze glucagon and apomyoglobin digested with carboxypeptidase P.

For glucagon, 19 carboxyl-terminal residues were identified, but only portions of the spectrum were shown and the peak heights for identified fragments varied, probably due to non -uniform release of amino acids. The rate of amino acid release depends on the carboxypeptidase, the amino acid, and the sample's sequence. Because the full spectrum was not shown, it is difficult to evaluate the confidence in sequence assignments.

The spectrum for apomyoglobin was even more problematic: it contained many low intensity peaks, and many of these were not labeled. This spectrum illustrates a common problem with low-quality spectraone can simply ignore peaks that do not fit the expected pattern. This, together with the variable peak heights, makes sequencing unknown samples difficult. For these reasons, the application seems unquantitative and limited to confirming known sequences.

Shortened peptides may be also identified by matrix assisted laser desorption-time of flight-mass spectrometry (MALDI -TOF-MS), a method requiring spotting samples with matrix on a metal stage. Patterson et al. (Anal. Chem. 67, 3971-3978, 1995) have used this method to analyze pmol amounts of a fragment of adrenocorticotropic hormone treated with carboxypeptidase Y. Nineteen amino acids were identified. The results were then subjected to a statistical analysis to determine the suitability of the method for samples with unknown sequences. They concluded that except for ambiguities in the isomass pairsIle/Leu and Gln/Lys (and sometimes Gln/Lys/Glu/Met)it should be possible to make confident sequence assignments.

However, when the authors used the method with other peptides, they obtained only 1 to 5 residues of sequence. It seems it is more realistic to expect no more than 5 residue assignments for an average peptide and to be prepared to see one or no residue assignments with some samples. They also showed that analysis can be simplified by combining several digestion time points or carboxypeptidase concentrations in a single spectrum. More recent studies address applying the method to proteins as large as 30 kDa (meeting abstract): it appears that only some proteins can be analyzed in this manner and that mass accuracy is too low for analysis of samples with unknown sequences.

In conclusion, enzymatic methods for analysis of peptides, especially when coupled to MALDI-TOF, offer advantages of speed and sensitivity, so this method is likely to become more popular over the next few years. However, analysis of proteins is better performed with chemical methods.

Victoria L. Boyd, David Dupont, Sam Woo, MeriLisa Bozzini, John Bergot, and Pau Yuan, "Chemical Side-Reactions during Carboxyl-Terminal Sequencing"

The carboxyl-terminal sequencing method first described in 1992 (Anal. Biochem. 206, 344-352, 1992) is being adapted to a reconfigured PE Applied Biosystems 490 protein sequencer, the Procise C. A scheme depicting a cycle of chemistry by the PE -Applied Biosystems method is provided in Figure 1. The sequencing method first forms a thiohydantoin (TH) at the carboxyl-terminal residue using acetic anhydride and piperidine thiocyanate. After the first TH is formed, the method deviates from the conventional TH approach for carboxyl-terminal sequencing. The TH is alkylated with bromomethylnapthalene in the presence of diethylpropylamine as a base. The resulting alkylated-TH (ATH) is a better leaving group than the parent TH and is readily cleaved from the rest of the protein with thiocyanate under acidic conditions using trifluoroacetic acid (TFA). Simultaneous with ATH cleavage, thiocyanate ion converts the adjacent amino acid residue into a TH. The sequencing continues with a cycle that consists of alkylation followed by cleavage /derivatization.

The advantages of this sequencing method are the mild cleavage conditions and simultaneous formation of a TH at the carboxyl-terminus. This method eliminates the need for repeated activation of the carboxyl group and also introduces a tag onto the ATH (a methylnapthyl group), allowing for flexibility in detection and chromatographic separation. The tag remains on the ATH during HPLC analysis. By choosing a different alkylating reagent, such as one with a fluorescent label, a fluorescence detector may be used. The chromatographic separation is also influenced by the hydrophobic or hydrophilic character of the chosen alkylating reagent, which can dominate the elution character.

Our research has recently focused on derivatization of Asp, Glu, Ser, and Thr prior to sequencing in order to avoid side -reactions with these amino acids. The necessary modifications have been incorporated in our automated sequencing method. Selective amidation of the Asp and Glu side-chains (J. Org. Chem. 60, 2581-2587, 1995) is performed under basic conditions after delivery of the carboxyl-group activating reagent, acetic anhydride (Ac2O). Selective amidation of Asp and Glu side -chains is possible because the carboxyl-terminus forms a different activated species than the side-chain carboxyl groups: the activated a-carboxyl group forms an oxazolone, while the side-chain carboxyl groups of Asp and Glu residues form mixed anhydrides. The side-chain mixed anhydrides are susceptible to nucleophilic reactions under basic conditions, but the carboxyl-terminal oxazolone is ionized and therefore unreactive. Piperidine thiocyanate dissociates under basic conditions, and piperidine reacts with the activated (but neutral) Asp and Glu side-chain mixed anhydride but does not react with the ionized oxazolone at the carboxyl-terminus. Asp and Glu residues are thereby converted into the corresponding piperidine amides. The carboxyl-terminal oxazolone is converted into a TH under acidic conditions with tetrabutylammonium thiocyanate. Amidation of Asp and Glu may also be performed after converting the carboxyl-terminus into a TH. Like the oxazolone, a TH ionizes under basic conditions and will not react with piperidine under the reaction conditions that convert a mixed anhydride into the amide.

The hydroxyl groups of Ser and Thr are acetylated on the Procise C with Ac2O during sequencing by using an acylation catalyst, N-methylimidazole (NMI). However, the use of NMI with Ac2O also catalyzes unwanted side-reactions with the carboxyl-terminal carboxyl group (oxazolone) or TH. Therefore, acetylation of the Ser and Thr hydroxyls is performed after forming an ATH at the C-terminus. Sequencing is improved using the Ser/Thr acylation protocol, and the residues are detected as their respective dehydro-ATH derivatives.

A 5-minute pretreatment of the PVDF-bound protein with phenylisocyanate (PIC) also helps to protect the hydroxyl groups of Ser and Thr by acylation. The "capping" of the hydroxyl groups by PIC is similar to the capping of the 5'-hydroxyl during DNA synthesis. PIC efficiently derivatizes the e-amino group of lysine residues (and the amino-terminus) forming phenylureas. The increased hydrophobicity of a protein due to PIC modification aids in retaining the protein on the hydrophobic PVDF support. In our experience, most proteins sequence with a higher initial yield if first modified by PIC.

Figure 1: Carboxyl-terminal sequencing chemistry used on the PE/Applied Biosystems Model 490 Procise C Sequencer.

Each cycle is analyzed for the cleaved ATH by HPLC, and the ATH is identified by comparison to an amino acid ATH standard chromatogram. A strength of the sequencing method is the absence of a "build-up" of a background in the HPLC during successive cycles of sequencing. However, residual bromo -methylnapthalene and its byproducts are seen during HPLC analysis and constitute a regular chemical fingerprint seen during each analysis. The largest of these peaks is seen at a retention time of 30 minutes and has been identified as methylnapthylthiocyanate (MNTC), formed from the reaction of the alkylating reagent with thiocyanate. To reduce the presence of MNTC, we wash the valve blocks and reaction chamber with gaseous TFA and with ethylacetate at the onset of each cycle. Reducing the size of the MNTC peak and adjusting the mobile phase and gradient during HPLC analysis results in the resolution of the reference standards from each other and the MNTC peak. A chromatogram of 17 of the ATH reference standards and the MNTC peak is provided in Figure 2. Missing from the chromatogram are Pro, Ser, and Cys. Pro generally stops sequencing. Ser is seen occasionally as dehydroalanine and is resolved from the other ATH reference standards. The consistent detection of dehydroalanine, however, requires further cycle development. Cys, if alkylated with acrylamide, is readily detected as the ATH at about 11 minutes (but was not yet available as a reference standard at the time of this presentation).

Figure 2: Typical and standard chromatogram obtained during carboxyl-terminal sequencing with the PE/Applied Biosystems Procise C. The upper panel is a chromatogram obtained during sequencing and demonstrates the relative levels of signal from sample and chemistry artifacts. The lower panel shows the separation of ATH-AA standards.

The first 3 of 13 cycles of sequence analysis on 1 nmol of apomyoglobin, obtained at a verification test site, are shown in Figure 3. To our knowledge, no other chemical carboxyl-terminal sequencing method to date has demonstrated the ability to sequence any protein for 13 cycles. Other examples of proteins that were sequenced by this method for 4, 5, and 7 cycles and that contain Ser, Thr, Asp, and Glu (carboxyl-terminal) can be found in the conference proceedings from the 1994 Methods in Protein Structure Analysis Meeting.

Figure 3: Chromatograms obtained by carboxyl-terminal sequencing of apomyoglobin with the PE/Applied Biosystems Procise C. The first 3 of 13 interpretable cycles are shown.

Jerome M. Bailey and Chad G. Miller, "Automated Thiohydantoin Chemistry for Carboxyl-Terminal Sequence Analysis"

Samples for carboxyl-terminal sequencing are prepared by a single-step application of the protein or peptide to a Zitex membrane (Bailey et. al., Anal. Biochem. 212, 366-374, 1993). The use of PVDF is not recommended because this support is unstable to the basic conditions used during the cleavage reaction. No pre-sequencing attachment (covalent coupling) chemistries are required. All samples for carboxyl-terminal sequence analysis are non-covalently absorbed to the Zitex supports. This sample application method is amenable to product formulations, as well as samples that are isolated using routine separation procedures involving various buffer systems, salts, and detergents. The matrix components are conveniently eliminated during the initial steps of the sequencing cycle. The chemical method is universal for any of the twenty common amino acids and easily affords the thiohydantoin derivatives of Ser and Thr, which have been frequently found at the carboxyl-terminus of proteins.

Sequencing Chemistry The sequencer column reactions that occur on the Zitex membrane include chemical coupling and cyclization of the carboxyl-terminal residue to a thiohydantoin and cleavage to release the thiohydantoin amino acid derivative. The thiohydantoin amino acid is then extracted from the Zitex membrane in the sequencer column to the sequencer transfer flask where it is prepared for HPLC injection. There are no special begin cycles, because pre-sequencing chemical modifications or derivatizations are not required. This avoids any sequencing ambiguities associated with incomplete or non -specific amino acid sidechain reactions.

The chemical scheme for carboxyl-terminal sequencing is shown in Figure 4. The protein is first treated with trifluoroacetic acid (TFA) to generate a protonated carboxylic acid. Reaction of the protein carboxyl-terminus with diphenyl phosphoro-isothio -cyanatidate (DPP-ITC) followed by pyridine results in the forma -tion of a peptidylthiohydantoin. The sample is then treated with TFA to stabilize the peptidylthiohydantoin formed from carboxyl -terminal proline (Bailey et. al, Anal. Biochem. 224, 688-596, 1995). This acid treatment has no effect on peptidyl-thiohydantoins formed from the other 19 amino acids. The peptidylthiohydantoin is then treated with potassium trimethylsilanolate to generate a shortened peptide ready for continued sequencing and a thiohydantoin amino acid that is identified and quantitated by on -line reversed-phase HPLC. Because the thiohydantoin amino acids produced have UV absorption spectra and extinction coefficients similar to the phenylthiohydantoin amino acids formed during the Edman degradation, the sensitivity of carboxyl -terminal sequencing is expected to rapidly approach that currently possible with amino-terminal sequencing.

Figure 4: Carboxyl-terminal sequencing chemistry used on the Hewlett-Packard Model G1009A Sequencer.

HPLC Analysis of the TH-Amino Acid Derivatives A stable thiohydantoin amino acid standard mixture is incorporated on the sequencer for on-line automated peak calibration and quantitation and consists of the synthetic thiohydantoin derivatives corresponding to the actual sequencing products for each of the twenty amino acids (Figure 5). Because the sequencing product formed from serine and cysteine yield the same degradation product (dehydro-alanine thiohydantoin), the residue assignment of cysteine in unknown sequences requires prior chemical modification of cysteine (S-alkylation) as is routinely done with amino-terminal sequencing methods. The TH-Ile analogue elutes as a doublet due to the formation of the allo-isomer during the synthesis of the thiohydantoin.

Figure 5: Typical and standard chromatogram obtained during carboxyl-terminal sequencing with the Hewlett-Packard Model G1009A Sequencer. The upper panels are chromatograms obtained during sequencing of b-lactoglobulin A (1 nmol). The lower panel shows the separation of TH-AA standards.

 

Sequence Analysis of a Standard The results for cycles 1-4 of automated carboxyl-terminal sequence analysis of b -lactoglobulin A (1 nmol) are shown in Figure 5. The results demonstrate unambiguous carboxyl-terminal residue assignments for the first four cycles. Typical initial yields for protein samples are 50-60% with repetitive yields also at 50-60%. The method is currently able to provide 3 to 5 cycles of sequence information. However, the first cycle provides the most unambiguous and definitive information because all protein formswhether they result from internal processing, clippings, or single residue truncations during purificationare available for analysis in their relative proportions during the first cycle.

Samples Purified by Gel Electrophoresis Analytical gels are routinely used for the analysis of protein samples and recombinant products to assess sample homogeneity. The all too common observation of several closely resolved bands or unexplained lower molecular weight bands is frequently an indication of either cellular processing events or fragmentations induced during purification. Automated carboxyl-terminal sequence analysis provides a direct and rapid method for the characterization of these various protein species and facilitates the examination of their origin and control. A wide variety of proteins electroblotted onto Teflon tape have been analyzed at the 50-300 pmol range. Blotting onto Teflon tape has also permitted both amino-terminal and carboxyl-terminal sequencing to be performed on the same sample (Burkhart et al., Anal. Biochem. 236, 364-367, 1996).

Tandem Amino-Terminal and Carboxyl-Terminal Sequence Analysis The tandem analysis process of amino-terminal protein sequencing followed by carboxyl-terminal sequencing of the same sample provides a combined, unequivocal technique for the determination of the structural integrity of expressed proteins and biologically processed precursors. The analyses are performed on a single sample of protein, thereby eliminating any ambiguities that might be attributed to sample-to-sample variability and sample preparation.

The protein sample is either applied as a liquid to a Zitex membrane or electroblotted to Teflon after SDS-PAGE and inserted into a membrane-compatible amino-terminal sequencer column. The sample is subjected to automated amino-terminal sequencing using the HP G1005A protein sequencer. The sample membrane is subsequently transferred to a carboxyl-terminal sequencer column and subjected to carboxyl-terminal sequencing using the HP G1009A protein sequencer. In this manner, a single sample is analyzed by both sequencing protocols, resulting in structural information that pertains precisely to a given population of proteins.

Figures 6A and 6B demonstrate the tandem sequencing protocol on an SDS gel sample of horse myoglobin. About 250 picomoles of myoglobin were applied to an SDS mini-gel across five lanes, subjected to electrophoresis, and electroblotted to Teflon membrane. After staining with sulforhodamine B, five bands were excised and inserted into the amino-terminal sequencer column. Five cycles of amino-terminal sequencing gave the sequence GLSDG, with 47 pmol Gly recovered at cycle 1 (Figure 6A). The sequenced Teflon bands were transferred to a carboxyl -terminal sequencer column, and the results (Figure 6B) show the expected carboxyl-terminal sequence GQF, with 79 pmol Gly recovered at cycle1.

Figure 6A: Amino-terminal sequencing of 250 pmol of horse myoglobin with the Hewlett-Packard Model G1005A Sequencer. The sample was prepared for sequencing by SDS-PAGE, then blotted to Teflon membrane. The first five cycles are shown.

Figure 6B: Carboxyl-terminal sequencing of 250 pmol of horse myoglobin with the Hewlett-Packard Model G1009A Sequencer. After amino-terminal sequencing (Figure 6A), the sample was transferred to a G1009A instrument for carboxyl-terminal sequencing. The first three cycles are shown.

 

John Shively may be contacted at the Beckman Research Institute, City of Hope Hospital, 1450 E. Duarte Rd. Duarte, CA 91010; Victoria L. Boyd at Perkin-Elmer/Applied Biosystems Division, Foster City, CA 94404; and Jerome M. Bailey at Hewlett-Packard, Co., California Analytical Division, 1601 California Ave., Palo Alto, CA 94304.

 


Return to the The ABRF Home Page


Created: 21st September 1996
Last modified: 21st September 1996