CALLING YOUR ATTENTION TO.....


Researchers are always looking for techniques that provide more data, more quickly, using less sample and less expensive instrument time and reagents. Identifying a protein with little, or no sequencing by Edman degradation represents a significant advance in streamlining the process for proteins listed in protein or nucleic acid sequence data bases. Abstracts (202-S, 203-S, 204-S) presented in July at the 7th Symposium of the Protein Society in San Diego describe similar approaches. Three different groups (Henzel, et al.; Mann; and Pappin, et al.) describe the identification of proteins using primarily the mass of a few fragments. Identifying a protein using the mass of 3-5 fragments and perhaps some partial sequence data is possible with the aid of several types of mass spectrometry. An important part of the work described is the use of computers to manipulate the sequences in the data bases. The abstracts are reproduced below.

A Novel Approach for Identifying Two-Dimensional Gel Proteins by Molecular Mass Searching of Peptide Fragments in Protein Sequence Databases.


William J. Henzel, Todd M. Billeci, John T. Stults, Susan C. Wong, Christopher Grimley and Colin Watanabe.
Genentech, Inc., 460 Pt. San Bruno Blvd., So. San Francisco, CA 94080

A rapid method for the identification of known proteins, separated by two-dimensional (2-D) gel electrophoresis, is described in which molecular masses of peptide fragments are used to search a protein sequence database. The peptides are generated by in situ reduction, alkylation, and tryptic digestion of proteins electroblotted from 2-D gels. Masses are determined at the sub- picomole level by matrix-assisted laser desorption/ionization mass spectrometry of the unfractionated digest. A computer program has been developed that searches the protein sequence database for multiple peptides of individual proteins that match the measured masses. To ensure that the most recent database updates are included, a theoretical digest of the entire database is generated each time the program is executed. This methodology facilitates simultaneous processing of a large number of 2-D gel spots. The methodology was applied to a 2-D gel of a crude E. coli extract that was electroblotted onto PVDF. Ten randomly chosen spots were analyzed. With as few as three peptide masses, each protein was uniquely identified from over 91,000 protein sequences. All identifications were verified by concurrent N- terminal sequencing of identical spots from a second blot. One of the spots contained an N- terminally blocked protein that required enzymatic cleavage, peptide separation, and Edman degradation for confirmation of its identity.

Identification of Proteins in Sequence Databases Using Molecular Weight and Partial Sequence Data


Matthias Mann,
EMBL, D-6900 Heidelberg, Germany

Previously, a fast and reliable method to localize proteins in sequence databases using molecular weight (MW) information obtained by mass spectrometry (MS) has been described (1). The MW of the intact protein and/or the MWs of peptides obtained after degradation of the protein with specific cleavage agents uniquely identifies a protein in databases such as the Protein Identification Resource (PIR). Here an extension of that method is presented which incorporates partial information about the amino acid sequence into the search. This information could consist of the partial or ambiguous sequences obtained from either Edman degradation or MS/MS that are common when using low sample amounts. A recent pattern searching algorithm is used to find all occurrences of the partial sequences in the database subject to the constraint of the measured mass. Results show the method to be extremely specific, i.e. proteins can usually be located when just a few consecutive amino acids in one or two peptides and the peptide masses are known. This finding has implications for large scale protein identification projects such as 2D gel databases and cDNA projects. (1) M. Mann, P. H\'bfjrup and P. Roepstorff Proc. of the 40th ASMS Conference on Mass Spectrometry and Allied Topics (1992) p. 957.

Rapid Identification of Proteins By Database Matching of Proteolytic Peptide Masses.


Darryl J.D. Pappin, Alan J. Bleasby, Chris W. Sutton and John S. Cottrell
Imperial Cancer Research Fund, London, WC2A 3PX, UK.
SERC Daresbury Laboratory, Warrington, WA4 4AD, UK., and
Finnigan MAT Ltd., Hemel Hempstead, HP2 4TG, UK.

A method for identifying proteins from a mass spectrum of peptides, generated by a proteolytic or chemical cleavage, is described. The set of measured molecular weights are matched against a peptide mass database, generated by applying the same proteolytic or chemical cleavage rules to a database of known protein sequences. The database was tested with data from plasma desorption, electrospray, and matrix assisted laser desorption (MALD). Using a probability based search algorithm, experience has shown that proteins can be uniquely identified in a database of >48,000 sequences using only 4 or 5 molecular weight values. The greatest selectivity is provided by peptides in the range of 700 to 3000 Da. Mass measurement accuracy is not critical, and even errors of several daltons cause little distortion in the results. The incidence of random matches can be reduced to negligible levels by correlating data from two digests using different enzymes. The inclusion in the search algorithm of empirical rules relating incomplete cleavage to sequence was found to increase the accuracy substantially. MALD is an ideal ionization technique for this application, since it provides low-picomole sensitivity, tolerance to contaminants such as the digest buffer, and it does not require chromatographic separation of the digest products. This approach has enormous potential for rapid identification of proteins resulting from whole cell lysis following


Return to the The ABRF Home Page


Created: 29th July 1995
Last modified: 29th July 1995