How do you determine how far in to sequence a protein to be 100% (or near
to that) sure of its identity? I have a researcher who would like to be
sure that X protein is X protein. It's isolated from Pseudomonas and we
know the sequences of this protein for a few different species of
Pseudomonas.
I typically use OWL through Genestream at the CNRS in France--this
composite database has always given me more successful searches than any
one database.
Entering the string of the first 10 amino acids of protein X I get a
Smith-Waterman score of 56--the next hit down has a score of 47. NOT a
good enough distinction, but how far to go?
With a string of 15 aa's, scores are 86--next highest is 49.
16aa's give scores of 94 and 53
17aa's, 99 and 53
20aa's 114 and 59.
Where to draw the line?
If there is a reference with a clear explanation of the Smith-Waterman
values and how to use them (I do like these in layman terms, as I confess
my love of math disappeared when I graduated from college), I'd love to
get it.
Thanks,
Deb McMillen
Institute of Molecular Biology
University of Oregon
EUgene OR 97403