Created: 1st June 2000, last updated: 30th August 2000, © 2000 ABRF
William J. Henzel,a Arie Admon,b Steve A. Carr,c Gary Davis,d Karen De Jongh,e William Lane,f Michael Rohde,g and Laurey Steinkeh
aGenentech, Inc., South San Francisco, CA; bTechnion, Haifa, Israel; cSmithKline Beecham Pharmaceuticals, King of Prussia, PA; dBayer Corp., West Haven, CT; eZymoGenetics, Seattle, WA; fHarvard University, Cambridge, MA; gAmgen, Thousand Oaks, CA; hUniversity of Nebraska Medical Center, Omaha, NE
The ABRF-98SEQ sample was the 11th in a series of amino acid sequencing studies performed by the Protein Sequence Research Group of the Association of Biomolecular Resource Facilities. This study was designed to aid participants' laboratories in determining their abilities to analyze the amino acid sequence of a peptide at high sensitivity using Edman degradation, mass spectrometry (MS), or both. ABRF-98SEQ is a 17-amino acid synthetic peptide (IFDDEIEEVQALYPTER) that resembles a typical tryptic peptide. It was distributed at the 2.8-pmol level. The sample was sent dried in a microfuge tube accompanied by instructions on solubilizing the sample and by a survey form. Including tentative calls, the correct sequence was obtained by 16% of the responding participants, compared with only 6% in the 1997 study when the low-level peptide was a minor component of a mixture. This increase probably reflects the purity of ABRF-98SEQ. A secondary factor in the increase in correct calls may be the larger number of respondents this year reporting that they perform sequence analysis at the 1- to 10-pmol level. Most respondents who obtained the correct sequence used a combination of Edman sequencing and molecular weight determination by MS. Overall, the accuracy and sensitivity of peptide sequencing by Edman degradation continue to improve and are clearly aided by the use of MS for molecular weight determination. Although peptide sequencing by MS is not yet routinely practiced by the participating laboratories, results of this study indicate that MS-derived sequence data, when properly interpreted, are valuable for correcting, completing, or corroborating sequence assignments derived by Edman. (J Biomol Tech 2000;11:92-99)
Key Words: ABRF-98SEQ, protein sequencing, mass spectrometry (MS), post-source decay (PSD), tandem mass spectrometry (MS/MS).
Address correspondence and reprint requests to: William J. Henzel, Department of Protein Biochemistry, Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080 (email: wjh@gene.com).
The Protein Sequence Research Group (PSRG) of the Association of Biomolecular Resource Facilities (ABRF) has conducted a series of studies over the past 11 years to evaluate the current range of capabilities of member laboratories in their ability to perform protein and peptide sequencing.1-10 These studies have been designed to evaluate the current state of the art of protein sequencing technology and to evaluate common problems in the utilization of this technology. The studies have also served as an education tool to assist members in improving their protein sequencing methodology. The ABRF-98SEQ is the 11th in this series of sequencing studies and was designed to evaluate peptide sequencing at the low-picomole level. The peptide was designed to be sequenced by Edman degradation, mass spectrometry (MS), or a combination of the two.
With the introduction of capillary phenylthiohydantoin (PTH) analysis, subpicomole sequencing by automated Edman degradation is now commonly used. Advances in MS instrumentation have also improved the detection limits of MS and tandem mass spectrometry (MS/MS) analysis of peptides. A goal of the ABRF-98SEQ study was to determine the ability of member laboratories to perform MS/MS analysis on a peptide with a sequence that was not in a protein sequence database. Another goal was to evaluate the recent improvements in Edman sequencing instrumentation for low-level peptide sequencing.
A 17-residue peptide was synthesized at Bayer Corporation using 9-fluorenyl-methyl-carbonyl chemistry (fmoc) chemistry on a PE-Applied Biosystems model 430A peptide synthesizer at the Adirondack Biomedical Research Institute. The peptide contained a carboxyl-terminal arginine residue to resemble a typical tryptic peptide. The protecting groups were Asp (OtBu0), Glu (OtBu), Gln (Trt),Thr (tBu), Tyr (tBu), and Arg (Pbf). Cleavage was performed using 2.2% ethanedithiol, 4.4% thioanisole, 4.4% water, 89% trifluoroacetic acid (all v/v), and 67 mg/mL of phenol. The peptide was purified by high-performance liquid chromatography (HPLC) on a Vydac C18 column using 0.1% trifluoroacetic acid with an acetonitrile gradient. The dried peptide was reconstituted in water, and the concentration was determined in triplicate by ninhydrin amino acid analysis on a Beckman model 6300 analyzer. Aliquots of 2.8 pmol were transferred to Sarstedt 500-µL tubes and dried in a Savant Speed-Vac.
Samples were distributed to 312 laboratories, which were requested to sequence the peptide by using Edman chemistry or MS/MS, post-source decay (PSD), or a combination of these techniques. Participants were requested to assign each amino acid residue as positive, tentative, or no assignment. A positive call (PC) is defined as an amino residue determined with absolute confidence. Tentative calls (TC) are residues determined with some degree of uncertainty. The results were sent to a third party who removed all identifying marks before analysis by the sequencing committee.
Figure 1 summarizes the peptide sequencing results returned for the ABRF-98SEQ sample. A total of 56 data sets were received. The actual data may be viewed on the Internet (http://www.abrf.org/ABRF/ResearchCommittees/protseqrescomm.html).
FIGURE 1. Summary of the sequence assignments for the ABRF-98SEQ sample. The number of facilities is plotted versus the amino acid sequence of ABRF-98SEQ. TW, amino acid was tentatively assigned incorrectly; PW, amino acid was positively assigned incorrectly; TC, amino acid was tentatively assigned correctly; PC, amino acid was assigned positively correct; dash (--), no sequence assignment was made.
A "call" in protein sequence terminology is the identification of the amino acid present at a particular residue. The accuracy of positive calls is defined as the percentage of positive calls that are correct. This is one of the most important measurements of the quality of protein sequence data. Accuracy is a result of careful data interpretation. The higher signal-to-noise ratio obtained by protein sequencers equipped with capillary HPLC can result in higher accuracy. Another factor that may contribute to accuracy is experience. Data that are examined by more than one experienced researcher are likely to have a higher accuracy. The ABRF studies of recent years have attempted to emphasize the importance of this measurement. Positive accuracy from the studies conducted in 1995 and 1997 was 78% and 74%, respectively, for the minor components of the studies. The amount of the minor component was 15 pmol in 1995 and 2 pmol in 1997. The accuracy in this year's study was 90.8%, which represents an improvement in overall positive accuracy.
In the ABRF-98SEQ study, nine respondents obtained the correct sequence (TC + PC), and 18 made no incorrect positive calls. However, 20 of 56 respondents reported no positive correct calls on this low-level sequence. The failure of these laboratories to report positive calls may reflect the use of older protein sequencer instrumentation. Fifteen of these laboratories used the older PE/ABI 470, 473, or 477 sequencers. One laboratory used a HP G100x, and only three used the newer PE/ABI 49x Procise sequencers. This contrasts with the 18 laboratories that made no incorrect positive calls. Most of these investigators used new instrumentation, which included 2 HP G100x, 10 PE/ABI Procise 494s, 4 Procise cLC, and only two older instruments, a PE/ABI 470 and a 477. Higher level of signal produced by newer instrumentation results in increased confidence of amino acid assignment, allowing an investigator to assign amino acid residue as a positive call rather than a tentative call.
Figure 2 shows the number of amino acid residues that were correctly called and the protein sequencer instrumentation that was used to obtain the data. Of the nine laboratories that correctly determined the entire sequence, six groups reported all 17 residues as positive calls, and three laboratories had both positive and tentative calls. These nine laboratories used PE/ABI 49X-HT, PE/ABI 49X-cLC and HP G100x sequencers. Although the experience of the operator can strongly influence the number of correct calls, there was a strong correlation between the number of correct calls and the age of the instrumentation. Table 1 shows the protein sequencer instrument performance for ABRF-98SEQ sample. Generally, the laboratories that used older protein sequencer instrumentation obtained very poor results. The best results were obtained with the PE/ABI 49x-cLC sequencer, which uses a capillary HPLC for PTH analysis. The four laboratories that used the PE/ABI 494x-cLC sequencer, which has capillary PTH analysis obtained an average of 16.8 cycles correct versus 11 for the instruments that use a conventional 2.1-mm (internal diameter) column.
FIGURE 2. Comparison of the number of correct sequence calls and the protein sequencer instruments used. MS only, the facility used only mass spectrometric data to make sequence calls; 49x-cLC, PE Biosystems automated Edman sequencer equipped with a capillary HPLC; 49x-HT, PE Biosystems automated Edman sequencer; HP, Hewlett Packard protein sequencer. The 473/6, 477, and 470 are older model protein sequencers produced by PE Biosystems.
TABLE 1
Comparison of Protein Sequencer Instruments With Positive Accuracy and the Number of
Correct Sequence Calls
|
|
||||||||||
| Manufacturer | Model | Average Cycles Correcta |
Positive Accuracy (%) |
Age (years) |
N | |||||
|
|
||||||||||
| PE/ABI | 49x-cLC | 16.8 | 100 | 0.9 | 4 | |||||
| PE/ABI | 49x-HT | 11.0 | 89 | 1.9 | 27 | |||||
| Hewlett Packard | G100x | 7.4 | 95 | 3.0 | 7 | |||||
| PE/ABI | 470 | 4.0 | 100 | 13.0 | 3 | |||||
| PE/ABI | 477 | 2.7 | 66 | 7.1 | 7 | |||||
| PE/ABI | 473/6 | 2.0 | 86 | 6.5 | 4 | |||||
| Beckman/Porton | 2090/LF3000G | 1.0 | 0 | 6.0 | 3 | |||||
|
|
||||||||||
aAverage number of cycles correct includes positive and tentative calls. Instrument failures occurred on one Beckman/Porton, one PE/ABI 49x-HT, and one Hewlett Packard G100x.
In the ABRF-97SEQ sample, a matrix-assisted laser desorption and ionization (MALDI) mass spectrum was provided. In the ABRF-98SEQ study, most laboratories attempted to obtain the mass of the peptide (55%). Electrospray ionization was used by 23%, and 77% used MALDI MS. Most of the masses reported were within 1 dalton of the calculated mass (2066.99). However, four masses reported differed from the calculated mass by 6 to 21 daltons. This large mass error may have been a result of improper calibration. Two laboratories reported a mass of 1296.7 for the ABRF-98SEQ sample. This mass is the same as the mass calibrant angiotensin I that may have been left on the sample plate from a previous calibration.
The ABRF-98SEQ sample was designed to be sequenced by MS, and four laboratories attempted to do so. This was an increase in the use of MS/MS and PSD over the previous study ABRF-97SEQ where only 1 out of 50 laboratories used these techniques. In this study two laboratories that used PSD and one facility that used electrospray MS/MS obtained useful data. One of the laboratories that used PSD and molecular weight determination as the only techniques of analysis obtained only 7 correct residues. One facility used PSD to complement Edman sequence data and was able to obtain the correct sequence except for residues 14 and 15. No sequence was obtained by Edman or PSD analysis for these residues. However, the PSD analysis did show a mass difference of 198, which corresponded to a dipeptide containing Thr and Pro. Because no y3 or b14 ions were observed, the order of these two amino acids could not be determined. However, the lack of a b14 ion strongly suggests that the proline would be N-terminal to the Thr residue, because peptides containing Pro residues are known to readily fragment at their imino bond in the gas phase.11 Instead of assigning the Thr and Pro residues as tentative, these residues were assigned as positive calls, and the reported sequence of these two residues was backward.
One respondent who used electrospray MS/MS obtained nearly a complete y ion series and a majority of the b ions. These data were used to confirm the Edman data and to determine the aspartic acid at residue 4 and the C-terminal Arg residue. The MS/MS data were also used in conjunction with Edman sequencing to establish Ile as the N-terminal residue. The mass difference observed for the N-terminal residue was 113 daltons, corresponding to a Ile or Leu residue. The absence of a Leu and the presence of Ile in the first cycle of Edman data confirmed that the mass difference of 113 daltons corresponded to isoleucine. This was an excellent example of the complementarity of Edman and MS data.
Although this sample size is small, it appears that de novo sequence analysis by MS/MS is not a routine practice. Several publications have described manual and computer methods of de novo sequence interpretation.12-14 A review by Papayannopoulos described in great detail the process of MS/MS sequence interpretation.15 Despite the availability of literature on de novo interpretation and the capability of MS instruments that can easily perform MS/MS analysis, only a few laboratories returned results indicating that MS/MS was attempted.
We analyzed the ABRF-98SEQ sample by PSD on a PerSeptive Biosystems Voyager Elite time-of-flight mass spectrometer and by electrospray MS/MS analysis on a Finnigan LCQ ion trap and on a Sciex triple quadrupole. The MS/MS spectra obtained on the triple quadrupole provided almost a complete b and y ion series. All residues were represented by b or y ions, except the N-terminal residue (Fig. 3). The data obtained on the ion trap mass spectrometer (Fig. 4) contained fewer ions than the spectra obtained from PSD or from the triple quadrupole because of the low mass limit at both the N and C termini. Despite this limitation, the correct sequence could be assigned for residues 4 through 17. Neither the LCQ ion trap nor the Sciex triple quadrupole could distinguish between Ile and Leu for residues 6 and 12 because their relatively low-energy collision-induced dissociation (CID) does not result in appreciable side chain fragmentation. The lack of side chain fragmentation prevents the identification of the isobaric amino acids isoleucine and leucine.
FIGURE 3. LC-MS/MS on a PE-Sciex API III mass spectrometer. The ABRF-98SEQ peptide was dissolved in 3 µL of methanol:water:formic acid (45:45:10), and 1.5 µL was applied to the nanospray tip. MS and MS/MS data were acquired.
FIGURE 4. LC-MS/MS spectrum of ABRF-98SEQ obtained on a Finnigan LCQ ion-trap mass spectrometer. Twenty percent (420 fmol, 1 µL) of the peptide dissolved in 10% acetonitrile/0.1% trifluoroacetic acid (TFA) was loaded onto a 0.3-mm X 15-cm Vydac C-18 column (LC Packings) on-line with the spectrometer. The peptide was eluted with a gradient of acetonitrile in 0.05% TFA. A collision energy of 20% to 35% was used.
The spectra obtained by PSD were similar to the data obtained on the triple quadrupole, except the presence of additional ions in the PSD increased the complexity of interpretation (Fig. 5). The complexity of the PSD data made it extremely difficult to obtain the correct sequence using data derived solely from this method. However, in combination with a sequence derived by Edman data, the masses in the PSD spectrum were easily assigned and confirmed the Edman data. In general, PSD data are more complex than electrospray MS/MS data. However, several papers have described in detail a de novo method of sequence interpretation.16-21 PSD data can provide information on amino acid composition from the presence of immonium ions. This data is easily obtained from the same sample that was used for measurement of the intact mass. Immonium ions for leucine/isoleucine, proline, glutamic acid, glutamine, phenylalanine, and tyrosine corresponding to ions at 86, 70 101, 120, and 136, respectively, were observed in the PSD spectrum of ABRF-98 SEQ (Fig. 4).
FIGURE 5. Post-source decay (PSD) spectrum of ABRF-98SEQ obtained on a PerSeptive Biosystems Voyager Elite time-of-flight mass spectrometer. The ABRF-98SEQ peptide (2.8 pmol) was dissolved in 2 µL of 10% acetonitrile/0.1% trifluoroacetic acid. Ten percent (280 fmol) of this was applied to a premade spot of matrix (0.5 µL of 20 mg/mL alpha-cyano-4-hydroxycinnamic acid plus 5 mg/mL nitrocellulose in 50% acetone/50% 2-propanol) on the target plate. The spectrometer was operated in the reflector-delayed extraction mode. Fragment ions for selected precursor masses were obtained from PSD experiments. To enhance the ion abundance at low mass, collision gas (air) was introduced into the collision cell during the acquisition of the lower portion (<200 µm) of the fragment ion spectrum.
Nine laboratories assigned all amino acid residues in ABRF-98SEQ correctly, including tentative calls. This contrasts with the previous sample, ABRF-97SEQ, for which only three laboratories were able to assign all amino acids in the minor sequence correctly. That sample was distributed as a mixture of a 10-pmol peptide and a 2-pmol peptide, whereas this year's sample, ABRF-98SEQ, was distributed as a 2.8-pmol aliquot of a single peptide. The average accuracy of positive correct assignments was much higher for ABRF-98SEQ (91%) than it was for the 2-pmol peptide (minor component) in ABRF-97SEQ (74%). Elimination of all instruments using capillary HPLC from the study caused only a small drop in the average positive accuracy (to 89%), suggesting that the accuracy of calling the minor peptide was lower in ABRF-97SEQ because it was the minor component of a mixture. The purity of the sample is a major factor in the ability to call a positive correct sequence.
Analysis of ABRF-98SEQ by Edman degradation alone was used by 34 laboratories (positive accuracy of 87%). Only four laboratories attempted analysis of the sample by MS-based sequencing. Two of these laboratories attempted MS-based sequencing alone. They made no positive calls, whereas the two laboratories that used Edman and MS/MS analysis obtained data that were complementary from both techniques. Although protein identification by peptide mass fingerprinting and searching uninterpreted MS/MS data is now routine in most protein facilities, it appears from this study that the use of MS/MS to perform de novo sequencing is beyond the capability of most member ABRF laboratories. However, MS/MS data can be used to complement sequence data determined by Edman degradation. Many laboratories that used MALDI MS to determine the mass of the ABRF-98SEQ sample also have the capability to perform PSD analysis. Because PSD analysis can be performed on the same sample after a mass determination has been made, no additional sample is required. The additional data that may be obtained can be invaluable for the confirmation of low-level sequence analysis performed by Edman degradation.
Molecular weight information derived by MS is becoming more widely used as an aid to protein sequencing. According to the ABRF-98SEQ survey, 55% of member laboratories are using MS, as opposed to 42% in the previous year's study. Participating laboratories are able to sequence pure peptides with good positive accuracy from lower amounts of starting material than in the recent past. In 1994, the most recent year in which a pure sample was distributed, the accuracy was 95% for 50 pmol of sample distributed. A positive accuracy of 91% was obtained on this year's sample with a distribution of only 2.8 pmol. The average yield of phenylalanine in the second cycle of Edman degradation was 1.1 pmol, which was 39% of the 2.8 pmol provided. The high value of positive accuracy obtained on the low-level ABRF-98SEQ sample mostly likely reflects major advances in the technology of Edman degradation and the growing use of mass determination as an aid to sequencing.
We thank Tom Buckholz for the synthesis and purification of ABRF-98SEQ, Karen West for amino acid analysis, and Barb Roberts for establishing and maintaining the anonymity of the study.
1. Niece RL, Williams KR, Wadsworth CL, Elliott J, Stone KL, McMurray WJ, et al. A synthetic peptide for evaluating protein sequencer and amino acid analyzer performance in core facilities: design and results. In Hugli TE (ed): Techniques in Protein Chemistry. San Diego: Academic Press, 1989:89-101.
2. Speicher DW, Grant GA, Niece RL, Blacher RW, Fowler AV, Williams KR. Design, characterization and results of ABRF-89SEQ: a test sample for evaluating protein sequencer performance in protein microchemistry core facilities. In Hugli TE (ed): Current Research in Protein Chemistry. San Diego: Academic Press, 1990:159-166.
3. Yuksel KU, Grant GA, Mende-Muller LM, Niece RL, Williams KR, Speicher DW. Protein sequencing from polvinylidenedifluoride membranes: design and characterization of a test sample (ABRF-90SEQ) and evaluation of results. In Villafranca JJ (ed): Techniques in Protein Chemistry II. San Diego: Academic Press, 1991:151-162.
4. Crimmins DL, Grant GA, Mende-Muller LM, Niece RL, Slaughter C, Speicher DW, Yuksel KU. Evaluation of protein sequencing core facilities: design, characterization, and results from a test sample (ABRF-91SEQ). In Angeletti RH (ed): Techniques in Protein Chemistry III. San Diego: Academic Press, 1992:35-35.
5. Mische SM, Yuksel KU, Mende-Muller LM, Matsudaira P, Crimmins DL, Andrews PC. Protein sequencing of post-translationally modified peptides and proteins: design, characterization and results of ABRF-92SEQ. In Angeletti RH (ed): Techniques in Protein Chemistry IV. San Diego: Academic Press, 1993:453-461.
6. Rush J, Andrews PC, Crimmins DL, Gambee JE, Grant GA, Mische SM, Speicher DW. A synthetic peptide for evaluating protein sequencing capabilities: design of ABRF-93SEQ and results. In Crabb JW (ed): Techniques in Protein Chemistry V. San Diego: Academic Press, 1994:133-141.
7. Gambee JE, Andrews PC, Grant GA, Merrill B, Mische SM, Rush J. Assignment of cysteine and tryptophan residues during protein sequencing: results of ABRF-94SEQ. In Crabb JW (ed): Techniques in Protein Chemistry VI. San Diego: Academic Press, 1995:209-217.
8. DeJongh KS, Fernandez J, Gambee JE, Grant GA, Merrill B, Stone KL, Rush J. Design and analysis of ABRF-95SEQ, a recombinant protein with sequence heterogeneity. In Marachak D (ed): Techniques in Protein Chemistry VII. San Diego: Academic Press, 1996:347-358.
9. Fernandez J, Admon A, De Jongh K, Grant G, Henzel W, Lane WS, et al. Evaluation of ABRF-96SEQ: a sequence assignment exercise. In Marshak D (ed): Techniques in Protein Chemistry VIII. San Diego: Academic Press, 1997:69-78.
10. Stone K, Fernandez J, Admon A, Henzel W, Lane W, Rohde M, Steinke L. ABRF-97SEQ: sequencing results of a low-level sample. J Biomol Tech 1999:10:26-32.
11. Biemann K. Sequencing of peptides by tandem mass spectrometry and high-energy collision-induced dissociation. Methods Enzymol 1990;193:455-479.
12. Fernandez-de-Cossio J, Gonzalez J, Betancourt L, Besada V, Padron G, Shimonishi Y, Takao T. Automated interpretation of high-energy collision-induced dissociation spectra of singly protonated peptides by "SeqMS," a software aid for de novo sequencing by tandem mass spectrometry. Mass Spectrom 1998;12:1867-1878.
13. Qin J, Herring C, Zhang X. De novo peptide sequencing in an ion trap mass spectrometry with 18O labeling. Mass Spectrom 1998;12:209-216.
14. Taylor J, Johnson R. Sequence database searches in de novo peptide sequencing by tandem mass spectrometry. Mass Spectrom 1997;11:1067-1075.
15. Papayannopoulos I. The interpretation of collision-induced dissociation tandem mass spectra of peptides. Mass Spectrom Rev 1995;14:49-73.
16. Keough T, Youngquist RS, Lacey MP. A method for high-sensitivity peptide sequencing using postsource decay matrix-assisted laser desorption ionization mass spectrometry. Proc Natl Acad Sci USA 1999;96:7131-7136.
17. Spengler B. Post-source decay analysis in matrix-assisted laser desorption/ionization mass spectrometry of biomolecules. J Mass Spectrom 1997;32:1019-1036.
18. Pfeifer T, Rücknagel P, Kuellertz G, Schierhorn A. A strategy for rapid and efficient sequencing of Lys-C peptides by matrix-assisted laser desorption-ionisation time-of-flight mass spectrometry post-source decay. Mass Spectrom 1999;13:362-369.
19. Pfeifer T, Drewello M, Schierhorn A. Using a matrix-assisted laser desorption/ionisation time-of-flight mass spectrometer for combined in-source decay/post-source decay experiments. J Mass Spectrom 1999;34:644-650.
20. Suckau D, Cornett D. Protein sequencing by ISD and PSD MALDI-TOF MS. Analysis Magazine 1998;26:M18-M21.
21. Dancik D, Addona TA, Clauser KR, Vath JE, Pevzner PA. De novo peptide sequencing via tandem mass spectrometry. J Comput Biol 1999;6:327-342.