One of the primary functions of our protein identification service is to
"screen" <5% aliquots of samples submitted for internal sequencing so tha=
t
"known" proteins may be (relatively) quickly and inexpensively identified
before the PI embarks on the more expensive internal Edman sequencing
approach. Since we are self-supporting with regard to operating costs,
only those identifications have been independently confirmed that the PI
has requested we do so and has agreed to pay the additional service charg=
es
that will be incurred. Also keep in mind that often, samples are submitt=
ed
to our protein identification service to confirm a tentative identificati=
on
that the PI has already made via immunological and/or other approaches su=
ch
as by characterizing the nucleic acid binding or other properties of the
protein. Unfortunately, we often do not have "records" on identification=
s
made by our unit that previously or subsequently have been confirmed
outside our unit and sometimes, we neglect to record into our summary
tables our own confirmatory data (such as that obtained by database
searching of LC-MS/MS fragmentation data). There is, however, no doubt in
our mind that if and when we make an incorrect protein identification, th=
e
PI that is involved will eventually prove our error and at that time not
neglect to call our attention to it - so far we have not received such a
call. Another point to keep in mind is that often the protein that is
identified is a common protein such as a serum albumin or a glycolytic
enzyme that clearly cannot be responsible for the activity that is being
purified and hence, the PI (usually) has no wish to further confirm its
identity. Although we believe we take a conservative approach to protein
identification that is designed (we hope) with the philosophy that it is
far better to miss identifications than to make incorrect identifications=
,
we certainly recommend that all identifications be confirmed independentl=
y
via some other approach. =20
With regard to the certainty of protein identification (meaning that whe=
n
we "identify" a protein that identification is in fact correct) , we
believe it is very high when the search procedures outlined below are
followed. Please keep in mind that we believe the two major reasons for
not making an identification (i.e, and that account for our ability to
identify only 65% of submitted samples ) are an insufficient amount of
protein in the <5% aliquot of each digest that we screen by MALDI-MS and
that the protein is simply not yet in the database searched. =20
The question of what fraction of known proteins can be identified by
peptide mass searching is an interesting one. One relevant set of data a=
re
the analyses we carried out on twenty 2D gel spots from Haemophilus
influenzae (whose genome has been sequenced). In this case we identified
from one to four proteins in 19 of 20 spots, for a total of 23 proteins. =
A
parallel set of MS/MS peptide fragmentation data searches identified one =
or
more proteins in 15 of the 20 spots for a total of 17 proteins. Since on=
e
of the proteins identified by the MS/MS approach was not identified by
peptide mass mapping, one might estimate (from this dataset) the
probability of identifying a known protein by MALDI-MS peptide mass
searching as 23/24 =3D 96%. Kathy Stone will be providing more details o=
n
this study at her talk at the ABRF Meeting in the ion trap session.
In terms of the number of peptide mass search identifications confirmed =
in
our laboratory (and recorded by us in our databases), the following is th=
e
data we have as of this morning. We have identified (by our criteria - s=
ee
below) 90 proteins in the 141 samples analyzed (64%) and have confirmed 2=
8
identifications by MS/MS peptide fragmentation database searching and 6
identifications by Edman sequencing. So far, no peptide mass
identification has yet been proven to be incorrect. The median number of
masses searched in this dataset was 38.
Briefly, we search all data on PeptideSearch and ProFound using the
falling search parameters:
1. Taxonomy: all kingdoms
2. Modifications: none
3. Missed cleavage sites: 1
4. Mass tolerance: 0.3 Da or 0.015%, monoisotopic
5. MW range - from =BD to 2x the SDS PAGE estimated MW.
The primary criteria we use for an identification are a ProFound score of
1.0 for the top ranked protein and a minimum sequence coverage of 20% -
with both criteria having to be met. The median sequence coverage for th=
e
90 proteins identified was 34%. =20
Additional information regarding our protein identification service may =
be
found at:
http://info.med.yale.edu/wmkeck/procmald.htm#msid
If you happen to read the above Web pages and notice any errors or have
suggestions for improving them, by all means please contact us.
With best wishes for lots of successful protein identifications, =20
Kathy Stone & Ken Williams
___________
Dear ABRF Colleagues,
To take a slightly different tack and, perhaps, to "pour some more oil on
the fire", I would like to raise a discussion on the utility of the pepti=
de
mass fingerprinting technique in the light of Ken Williams data below.
In his description of his MS-Fit routine
(http://prospector.ucsf.edu/htmlucsf/instruct/fitman.htm), Karl Clauser
ends his discussion on the improvement of the mass accuracy of modern MAL=
DI
mass spectrometers by suggesting that "proteins can now be confidently
identified by peptide mass fingerprinting using masses alone with MS-Fit.
Identification certainty is primarily a function of the level of mass
accuracy." I have continued to wonder what this certainty level actually
was with modern instruments.
Ken Williams has, certainly for the first time I can recall seeing, quote=
d
some real life figures relating to this kind of protein identification.
These are, as Ken explains, genuine user supplied samples and he, no doub=
t
with the greatest experimental care, has been able to identify 65% of a s=
et
of 150 proteins by peptide mass searching. I take it Ken that this refers
to proteins that
could be unequivocally identified by only the information on the protein =
as
it was presented to the laboratory and the peptide mass list with no
additional structural information? I also take it that these
identifications were confirmed with some supplementary structural
techniques like Edman, MS/MS or PSD?=20
Does this mean that there is, in fact, an overall 35% uncertainty with th=
e
peptide mass fingerprinting technique and, if this is so, shouldn't this
statistic bother us given the requirement of our collaborators/customers =
to
unequivocally identify their proteins?=20
Don't get me wrong here, I think a list of masses from a tryptic digest i=
s
a fantastic thing to have to give you that warm "I know I have got the
right protein" feeling after getting some sequence information from Edman
or MS/MS.=20
It just bothers me that peptide mass fingerprinting is being promoted as =
a
primary technique of protein identification where I believe that Ken
Williams data (and my personal experience with a much smaller set of
proteins)suggests that it isn't . I think this type of data is enormously
valuable and goes hand in hand with the biological information as well as
the physicochemical
data (like pI and MW from a 2D gel) to give another level of confidence t=
o
a protein identification but if there is anyone out there planning to set
up a protein identification facility based on peptide mass fingerprinting
alone without the support of Edman or MS/MS, I would suggest they conside=
r
Ken's numbers carefully. I certainly won't report a protein identificatio=
n
based on a technique that only identifies two out of three proteins corre=
ctly.
If there is anyone else out there with a good set of data on real unknown=
s,
it would be great to hear about them, I look forward to the day when this
type of fingerprinting will achieve accuracies in the high 90's%, until
then, I think it important to interpret peptide mass fingerprinting data
with due statistical caution.
In view of Karl Clausers comments on mass accuracy at the beginning of th=
is
message, I think we need to discuss and promote techniques to drive our
mass accuracies up in order to evaluate, with the help of large unknown
sample sets, the real confidence limits to this technique. Looking
forward to continuing these discussions in San Diego.
Regards to all....Ken Mitchelhill=20