Protein analysis -Reply

Mark Lively (mlively@medcenter.wpmail.wfu.edu)
Mon, 31 Aug 1998 14:51:33 -0500

Terry,
If you use the BLAST server via the web at www.ncib.nlm.nih.gov you will have to choose the Advanced BLAST page. Enter your short sequence and below the entry box you should choose the "Expect" and set the value to 1000 (the maximum). Try the search with those parameters and you may get a hit if the exact sequence is there. Whether or not you get a hit will depende heavily on the amino acids that are in your sequence.
You can also try to force the program to use a smaller Word size by entering the following in the "Other advanced options" box. Enter "-W=1" (everything inside the quotes but NOT the quotes). This will cause the search word size to be only 1 and can sometimes help the program find a match.
These same parameters can be used with TBLASTN to search your query sequence against the nucleotide dbs.

If you fail to find anything using BLAST you may want to try to run a FASTA search. This search algorithm is much more sensitive and will report similarities even for very short query sequences. There are some sites out there that provide access to it but I don't know them. I run the GCG version in house on my own computer. (If your sequence is not secret, I could run the FASTA search here and email the results to you.)
For web resources, look at the NCSA Biology Workbench for a site that provides several very useful tools including FASTA. They are at http://biology.ncsa.uiuc.edu/ You must register with them but it costs nothing and I have found this site to be very useful, especially for multiple sequence alignments. Happy hunting!

Mark Lively

Mark O. Lively, Ph.D., Professor of Biochemistry
Molecular Genetics Program Director
Wake Forest University School of Medicine
Medical Center Blvd., Winston-Salem, NC 27157
Phone: 336-716-2969 Fax: 336-716-7200
Email: mlively@wfubmc.edu

>>> "Terry Stoming" <TSTOMING@MAIL.MCG.EDU> - 8/31/98 9:24 AM >>>
Could someone PLEASE give me a simple way to do a search for a protein sequence using ~5 residues