Prot Seq: URL's for search engines for multiple sequences

Laurey Steinke (lsteinke@molbio.unmc.edu)
Wed, 24 Mar 1999 13:06:05 -0600

--============_-1289820116==_ma============
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hello all, I have pasted in the instructions for searching with multiple
sequences. Bill Henzel mentioned the URL's in his talk, but they were in
the letter sent with the sample, not in the poster.

It was great to see you all at the meetings!

-laurey

Weather Inconsequential: I brought spring back with me from North
Carolina. It is beautiful!! Sunny, with the crocuses blooming in my yard.

Be aware that database search engines are available which will allow
multiple entries. Two examples of such are: MS-Edman at Protein
Prospector (http://prospector.ucsf.edu/), and Peptide Pattern at
http://www.mann.embl-heidelberg.de/Services/PeptideSearch/FR_PeptidePatternF=
orm.
html The FindPatterns utility of GCG can also be used with parentheses
enclosing the alternatives. (For more complete instructions see the
enclosed sheet.)

The identities of protein mixtures can be determined by using the program
=46indPatterns in the Wisconsin Sequence Analysis Package, also known as GCG=
=2E
It is not available on the WWW, but may be installed on a central computer
at your institution.

There are several interfaces for running the GCG package, and they are
likely to have been customized for your site. Basically, as shown in the
example below, you need to run the program FindPatterns, optionally select
the maximum number of mismatches that you will allow, select a database,
enter the pattern, and provide the name of the output file.

To enter a pattern with multiple residues at a specific sequence position,
the amino acids are separated by commas and enclosed in parentheses as
shown in the example. An X indicates any amino acid, and it can be used to
designate no residue identified at particular sequence position.

EXAMPLE (UNIX command line)

findpatterns -mismatch=3D2

=46indPatterns identifies sequences that contain short patterns like GAATTC
or YRYRYRYR. You can define the patterns ambiguously and allow mismatches.
You can provide the patterns in a file or simply type them in from the
terminal.

FINDPATTERNS in what sequence(s) ? swissprot:*

Enter patterns individually, one per line.
End the list with a blank line.

Pattern 1: (D,S,G)(E,T)(H,L)(E,K)X(T,E,S)IA(E,H,G)R
Pattern 2:

What should I call the output file (* findpatterns.find *) ? abrf99seq.fin=
d

100K_RAT len: 889
104K_THEPA len: 924
10KD_VIGUN len: 75
110K_PLAKN len: 296
.
.
.

This example, which allowed a mismatch of up to two residues per sequence,
will find a list of sequence entries which includes two proteins having the
sequences:
DTHKSEIAHR
SELEKTREER
Laurey Steinke
Protein Structure Core Facility
University of Nebraska Medical Center
Omaha Nebraska, 68198-4525

Phone (402) 559-6647
=46AX (402) 559-6650
lsteinke@molbio.unmc.edu

--============_-1289820116==_ma============
Content-Type: text/enriched; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<fontfamily><param>Times</param>Hello all, I have pasted in the
instructions for searching with multiple sequences. Bill Henzel
mentioned the URL's in his talk, but they were in the letter sent with
the sample, not in the poster.

It was great to see you all at the meetings!

-laurey

Weather Inconsequential: I brought spring back with me from North
Carolina. It is beautiful!! Sunny, with the crocuses blooming in my
yard.

Be aware that database search engines are available which will allow
multiple entries. Two examples of such are: MS-Edman at Protein
Prospector (<underline>http://prospector.ucsf.edu</underline>/), and
Peptide Pattern at
<underline>http://www.mann.embl-heidelberg.de/Services/PeptideSearch/FR_Pept=
idePatternForm.html</underline>
The FindPatterns utility of GCG can also be used with parentheses
enclosing the alternatives. (For more complete instructions see the
enclosed sheet.)=20

The identities of protein mixtures can be determined by using the
program FindPatterns in the Wisconsin Sequence Analysis Package, also
known as GCG. It is not available on the WWW, but may be installed on
a central computer at your institution.

There are several interfaces for running the GCG package, and they are
likely to have been customized for your site. Basically, as shown in
the example below, you need to run the program FindPatterns, optionally
select the maximum number of mismatches that you will allow, select a
database, enter the pattern, and provide the name of the output file.

To enter a pattern with multiple residues at a specific sequence
position, the amino acids are separated by commas and enclosed in
parentheses as shown in the example. An X indicates any amino acid,
and it can be used to designate no residue identified at particular
sequence position. =20

EXAMPLE (UNIX command line)

findpatterns -mismatch=3D2

=46indPatterns identifies sequences that contain short patterns like
GAATTC or YRYRYRYR. You can define the patterns ambiguously and allow
mismatches. You can provide the patterns in a file or simply type them
in from the terminal.

FINDPATTERNS in what sequence(s) ? swissprot:*

Enter patterns individually, one per line.

End the list with a blank line.

Pattern 1: (D,S,G)(E,T)(H,L)(E,K)X(T,E,S)IA(E,H,G)R

Pattern 2:

What should I call the output file (* findpatterns.find *) ?=20
abrf99seq.find

100K_RAT len: 889

104K_THEPA len: 924

10KD_VIGUN len: 75

110K_PLAKN len: 296

.

.

.

This example, which allowed a mismatch of up to two residues per
sequence, will find a list of sequence entries which includes two
proteins having the sequences:

DTHKSEIAHR

SELEKTREER</fontfamily>

Laurey Steinke

Protein Structure Core Facility

University of Nebraska Medical Center

Omaha Nebraska, 68198-4525

Phone (402) 559-6647

=46AX (402) 559-6650

lsteinke@molbio.unmc.edu

--============_-1289820116==_ma============--