Protseq

Deb McMillen (mcmillen@morel.uoregon.edu)
Sat, 24 Apr 1999 07:48:32 -0700 (PDT)

Hi, all,
A database question

How do you determine how far in to sequence a protein to be 100% (or near
to that) sure of its identity? I have a researcher who would like to be
sure that X protein is X protein. It's isolated from Pseudomonas and we
know the sequences of this protein for a few different species of
Pseudomonas.

I typically use OWL through Genestream at the CNRS in France--this
composite database has always given me more successful searches than any
one database.

Entering the string of the first 10 amino acids of protein X I get a
Smith-Waterman score of 56--the next hit down has a score of 47. NOT a
good enough distinction, but how far to go?

With a string of 15 aa's, scores are 86--next highest is 49.

16aa's give scores of 94 and 53

17aa's, 99 and 53

20aa's 114 and 59.

Where to draw the line?

If there is a reference with a clear explanation of the Smith-Waterman
values and how to use them (I do like these in layman terms, as I confess
my love of math disappeared when I graduated from college), I'd love to
get it.

Thanks,
Deb McMillen
Institute of Molecular Biology
University of Oregon
EUgene OR 97403