Research Committee Reports and Updates


Amino Acid Analysis Research Committee

This Committee collaborated with 72 participating core facilities in this year's challenge to assess their abilities to determine the correct amino acid composition of an unknown protein. This year's study had two additional goals: (1) to test protein quantitation accuracy using amino acid analysis as routinely performed in core facilities and (2) to determine whether the observed composition was sufficient to identify the test protein by comparison to databases using programs available on the Internet. Additional informationsuch as species, molecular weight, and isoelectric pointcan also be used with these programs to make correct identification more likely. We suggested that study participants submit their data to two different search programs, ExPASy and Propsearch.

In this study, the overall average error for amino acid composition accuracy was 11.9 ± 9.8% with a range of 4.0 to 58.9%. For the best 67 sites, the average error was lower, 9.6 ± 4.1%. The other 5 sites had errors ranging from 35-58%. Analysis accuracy improved from last year's study (21.2%), where samples on PVDF membranes were analyzed, but was slightly below that observed in the 1994 analysis of a soluble protein (10.9 ± 3.7%). The amino acids that were problematic included Met, His, Pro, and Ile, for both pre-column and post-column chemistries.

By far, the most commonly used chemistries were PITC and ninhydrin, which were about evenly used among the study participants. The differences in error for the two types of chemistries appeared insignificant. Of the top ten sites, six used post-column techniques; however, the laboratory with the lowest error used the pre-column PITC technique.

The average value for the total yield of protein sent to each laboratory was 6.8 ± 1.7 mg, with a range of 0.9 to 10.6 mg. Yield results appeared to be independent of the methodology used.

Ninety-three percent of the responding laboratories submitted their data to either the ExPASy or Propsearch Internet sites or to both. One laboratory used its own identification software, and four did not attempt to identify the test protein.

To compare the two search programs, data from all laboratories were resubmitted by this Committee to both sites. We found that ExPASy correctly identified the test protein as triosephosphate isomerase with 83% of the datasets and also correctly identified the species as rabbit with 30.6% of the datasets. With the Propsearch program, these figures were slightly higher: 90% correct identification of the enzyme and 44.4% correct identification of the enzyme and species. The ability to correctly identify the protein correlated rather well to dataset accuracy.

The ExPASy site allows a calibration protein to be included in the search, but a calibrant did not always improve the results. Of 41 sites using a calibrant, there was significant improvement in only 14 cases, no improvement in 16 cases, and actually a loss in identification for 11 cases. In these latter cases, the known protein was either mislabeled or possibly not pure enough to serve as a good calibrant. In addition, no improvement was noticed for poor quality data.

Most facilities (65%) used mass spectrometry to determine the mass of the unknown protein and obtained an average value of 26,804 ± 631.

In conclusion, this study demonstrates that at least 90% of typical core facilities are able to identify a pure protein simply by amino acid analysis.


DNA Sequence Research Committee

In its first study, this Committee sent a test sample to 95 facilities that perform automated DNA sequencing to evaluate how well they could sequence a moderately difficult template. The test sample chosen by the Committee was a double-stranded plasmid containing a GC-rich Thermus thermophilus DNA insert (69% GC over 926 bp). A survey questionnaire was included with this test sample to determine how DNA sequencing services have changed since the last survey was conducted two years ago. The Committee received 73 sequencing datasets from 50 facilities, and 53 completed surveys were returned.

Sequences submitted by study participants were ranked according to length of entirely correct sequence, i.e., the longest segment of sequence without ambiguous assignments, miscalls, deletions, or insertions. We chose this criterion for ranking because we felt it most closely matches the expectations of collaborating researchers, who rely on facilities to provide entirely correct data. The average length of entirely correct sequence was 284 bases, and the longest was 615 bases.

We noticed that gel length, data review, and sequencing chemistry influenced sequence accuracy and length. Although only 17% of the datasets were obtained with 48 cm or longer plates, these tended to cluster near the top of the ranking: four of the top ten responses were obtained on ABI 373 Stretch instruments with 48 cm plates. Even though 70% of the datasets were not edited, six of the seven top-ranked responses were. Several facilities submitted both edited and non-edited datasets: comparing these for each facility showed that, on average, editing increased the length of entirely correct sequence by 157 bases. Nearly all study respondents submitted data obtained using dye-terminator sequencing chemistry and the enzyme TaqFS. Some facilities submitted data they obtained with and without DMSO in the sequencing reactions. In most cases DMSO increased the length of entirely correct sequence; however many of the top-ranked responses did not use DMSO.

The survey showed the number of facilities performing automated DNA sequencing and the number of samples sequenced per year continues to grow: 56% of survey respondents sequenced 1000-5000 samples last year, and the total number of samples sequenced by the respondents was 223,548corresponding to almost 100 million bases. Based on the survey, a typical DNA sequencing facility offered primer design and synthesis, DNA analysis and database searches, but not template preparation services. Of the templates sequenced, 76% were double-stranded plasmids, and 21% were PCR products. Dye-terminator chemistry was used by 82% of the facilities, and two-thirds used ABI's TaqFS enzyme.

The Committee will "publish" the study results on the ABRF WWW Homepage and will modify the report as it refines its analysis of the data. This homepage can be viewed by using the link "ABRF Research Committee Reports" and then selecting "DNA Sequence Committee 1996 Study Report".


cGMP/GLP Research Committee

This Committee presented a poster at ABRF '96: Biomolecular Techniques highlighting their current project, a series of articles on test method validation. The Committee's first article in this series will give an overview of method validation, with details on laboratory requirements for performing validations, basic components of method validations (method development parameters, system suitability measures, documentation), general versus specific validations, and an extensive reference list. Subsequent articles will each focus on the validation of a selected analytical method. The example shown in the poster was amino acid analysis. Over 60 individuals requested further information about method validation by leaving cards at the poster (30 industry-affiliated, 20 vendor-affiliated, 11 academic-affiliated), indicating high interest in the topic among attendees. These individuals have been placed on a distribution list for validation article information. To place your name on the distribution list, contact Tim Hayes (E-mail: tkh1506@vms1.tamu.edu or tkhayes@tamu.edu, Tel: (409) 845-8315). The committee is currently preparing the first article, with a target submission date of late 1996.


Internal Protein Sequencing Research Committee

The primary goal of this Committee's first study was to provide members with a means to independently and anonymously evaluate their ability to generate and isolate peptides from a protein separated by SDS-PAGE. In addition, for those laboratories that do not yet perform internal sequencing, we hoped to provide protocols and a sample that might help introduce this technology into their facilities. Finally, we hoped to obtain data that might help determine the efficacy of internal sequencing from PVDF blots versus from in-gel samples.

Study participants were sent 70 pmol of two proteins as either Coomassie blue-stained gel samples or amido black -stained PVDF samples. One protein was a known 28 kDa recombinant protein, and the second was a 30 kDa version of the first protein containing a unique 15-residue peptide insert. With each set of proteins, we also sent two control samples: a "blank" piece of gel or PVDF and an external peptide standard, which had the same amino acid composition as the unique 15-residue peptide but a different sequence. We requested that study participants perform tryptic digestion and HPLC comparative peptide mapping on the samples and controls to identify the unique 15-residue peptide. As an option, we also recommended characterization of the 15-residue peptide by mass spectrometry or amino acid sequencing.

Although 39 laboratories who requested samples returned data, so far only the 41 data sets (from 28 laboratories) that the Committee received before the ABRF '96 meeting have been evaluated by the Committee. Of these 41 samples, study participants analyzed 12 by mass spectrometry and 23 by Edman sequencing. Upon reviewing this data it was clear there was wide diversity in the quality of the resulting chromatograms. The overall quality of the PVDF/gel digests and resulting comparative HPLC peptide maps correlated with the level of experience, the use of lower HPLC flow rates and smaller column inner diameters, and in the case of in-gel digests the absence of Tween 20 in the digest buffer. Many of the above average chromatograms were obtained on 1 mm or smaller inner diameter columns eluted at flow rates in the 20-100 ml/min range. No significant differences were found in the recovery of the target peptide from in-gel or PVDF samples. In both cases the median initial sequencing yield for the target peptide was close to 8 pmol, which corresponds to an overall recovery of about 11%. Two common problems were high background and, at the other extreme, lack of significant absorbance peaks. It was suggested that positive (i.e., a control digest on a similar amount of a known protein that had also been subjected to SDS-PAGE) and negative (i.e., a control digest on an equal-size section of PVDF membrane or gel that does not contain protein) controls should be helpful in diagnosing the causes of these two problems. The median number of residues sequenced from fifteen data sets that had correctly identified and sequenced the target peptide was 13 out of the possible 15 residueswith two PVDF and two in-gel data sets correctly identifying all 15 residues.


Nucleic Acids Research Committee

This Committee recently completed a study that analyzed crude, unpurified 25- and 50-base oligonucleotides synthesized in 71 participating core facilities. All oligonucleotides were evaluated for purity and coupling efficiency by capillary electrophoresis, and the 25-mers were evaluated for their ability to serve as sequencing primers. More than 85% of the submitted primers exceeded the industry standard of 98% coupling efficiency. On-line trityl monitors were found to be unreliable indicators of synthesis quality. All 25-mers more than 70% pure (greater than 98.5% coupling efficiency) performed essentially equally well as sequencing primers, i.e., there was no obvious relationship between purity and sequencing performance when purity was at least 70%. Primers 54-70% pure performed less well, but many still gave data essentially indistinguishable from those obtained with higher purity primers, suggesting that factors other than coupling efficiency are important. Primers less than 54% pure generally gave poor sequence data.

Together these results suggest that a slight increase in the industry standard for coupling efficiency from 98 to 98.5% could obviate the need for any purification of sequencing primers. However, with current instrument and reagent specifications, our results underscored the importance of rigorous quality control in DNA synthesis.

The Committee is currently preparing to launch a new study investigating the strategies that core facilities use to design and select sequencing primers. Participating facilities will receive a fluorogram from an automated DNA sequencer and will be asked to design and synthesize primers to extend this sequence. The Committee will then use these as sequencing primers, after purifying them. Our goals are to examine the strategies and software used to select primers and to evaluate the effectiveness of each of these approaches.


Peptide Synthesis Research Committee

For its 1996 study, this Committee wanted to evaluate racemization during peptide synthesis. Before requesting that the membership synthesize test peptides, the Committee needed to design and test a short peptide whose synthesis would be considered straightforward. Design elements were considered that would enhance the suitability of the test peptide for our study. Evaluation of this peptide would also provide the opportunity to establish which tests were most suitable for use in the study.

The Committee first designed a peptide susceptible to racemization at two residues, histidine and cysteine (1-2), and with the sequence RDRHEC. Glutamic acid was placed before the cysteine residue to determine whether racemized cysteine has an effect on carboxypeptidase A digestion. Arginine was placed before the histidine residue to examine the effect of a D-histidine on cleavage with trypsin. The Committee synthesized all four isomers of this peptide and, during analysis of the isomers, made several observations pertinent to our planned study. A chiral chromatography column composed of cyclo-dextran (Aztec Cyclobond II) did not resolve the four peptide isomers. A Vydac C-18 reversed-phase column did, and a Nucleosil C-18 column gave improved resolution of these highly polar peptides. However, the Committee decided this peptide was not suitable for its planned study, because too many variables existed to accurately analyze samples submitted by participating laboratories. In particular, we found that quantitatively evaluating the extent of racemization for both histidine and cysteine in the same peptide was difficult.

A simpler peptide, susceptible to racemization at only a histidine residue and with the sequence RERHAY was considered next. A tyrosine was included so the peptide could be monitored by ultraviolet absorbance, and an alanine was placed next to the histidine residue, because alanine can be cleaved by carboxypeptidase A. Arginine was placed before the histidine to determine if a D-histidine would interfere with trypsin cleavage. Amino acid analysis after derivatization with Marfey's reagent (1-fluoro-2,4-dinitrophenyl-5-L-alanine amide) (3) permitted very sensitive detection of racemization. Standard reversed-phase HPLC conditions separated the two peptide isomers, and their elution order depended upon solvent conditions. The Aztec "chiral" column was unable to resolve the peptides. An on-line chiral detector was also used during reversed-phase separations, but we found it is not yet suitable for routine analysis of peptide samples, primarily due to lack of sensitivity. Enzymatic digestions of test peptide isomers were analyzed by MALDI-TOF mass spectrometry. Trypsin digested peptides containing L- or D -histidine equally well and so cannot be used to distinguish peptides with D- and L-amino acids adjacent to the cleavage site. However, carboxypeptidase A could not digest past alanine when the peptide contained D-histidine. Thus, this enzyme appeared to be very sensitive to the presence of D-amino acids.

Therefore, for this year's study, participating laboratories have been asked to synthesize the second test peptide candidate, RERHAY. The Committee will analyze the peptides submitted by participating laboratories to determine the extent of racemization during synthesis. The results of this study will be available this summer and should be published at a later date.

References

1. Jones, J.H., Ramage, W.I. and Witty, M.J. (1980) Int. J. Peptide Protein Res. 15, 301-303.

2. Atherton, E., Hardy, P.M., Harris, D.E. and Matthews, B.H. (1991) in Peptides 1990, E. Giralt and D. Andreu, eds.), Escom, Leiden, The Netherlands, pp. 243-244.

3. Adamson, J.G., Hoang, T., Crivici, A. and Lajoie, G.A. (1992) Anal. Biochem. 202, 210-214.


Protein Sequence Research Committee

The 1996 protein sequencing study, ABRF-96SEQ, was designed as a sequence calling exercise, to evaluate the sequence assignment capabilities of participating laboratories. Data consisted of two sets of chromatograms that were generated on two different instruments by Committee members and identified as datasets A and B. Participants were asked to analyze the chromatograms and return their sequence assignments, as well as complete a survey. A total of 95 datasets were returned, including two from laboratories that do not carry out protein sequencing.

Calling accuracy for dataset A, which contained 32 cycles of sequence information from a novel protein with an initial yield of 20 pmol, was extremely good: 99.8% of sequence calls were correct. 87 responses called all 32 residues correctly, and only 6 of 95 responses contained calling errors. However, five of these errors were positive calls. The Committee felt that many positive calls in both datasets should have been tentative, especially when the data are complex or confusing.

Dataset B consisted of 25 cycles of sequence information from a mix of two peptides that were loaded at the level of 10 pmol (B major) and 2 pmol (B minor). A MALDI mass spectrum of this sample was also sent as part of the dataset. Overall accuracy on positive calls for B major was 94.8%, comparable to the best results seen in previous studies; overall accuracy on positive calls for B minor was somewhat lower, 85.2%. Calling accuracy for Cys (as carboxamidomethylated Cys) was 99%, a large improvement from previous years. While Trp was identified with less accuracy than other residues, it also was assigned more accurately than in previous years. This suggests that facilities can easily call these two residues, as well as the other usual residues, when the proper procedures to optimize their recoveries are followed.

Problem residues from dataset B included the first three cycles, where background was high, and cycle 11 (where the major and minor sequences were the same residue). Some responses made assignments beyond the ends of the actual peptide sequences. In many cases the calculated masses of the assigned sequences did not match the masses observed in the MALDI spectrum. This suggests that not all facilities are well -versed in the correct use of mass analysis data.

There are many resources available on the Internet for assisting facilities in interpreting mass analysis data. A brief list of these sites is given below.

Notes

1. Programs for peptide digest interpretation and calculation of mass from sequence.

2. Information about on-line searches and/or on-line database searching capability.

3. Library of matrices.

4. Hyperlinks to other MS, protein chemistry and molecular biology sites.

5. Information regarding ASMS meetings and short courses.

6. List of mass shifts due to amino acid modifications.


Survey Committee

This Committee surveyed directors of U.S. and international core facilities on their satisfaction with the commercially available biotechnology instrumentation used in their laboratories in November 1995. A total of 217 surveys were returned, for a 78% response rate. 171 facilities (79%) were from the U.S. The instrumentation surveyed included peptide and DNA synthesizers, carbohydrate analysis and capillary electrophoresis instrumen -tation, amino acid analyzers, mass spectrometers, and DNA and protein sequencers. The range of equipment surveyed included instruments no longer sold but still in use to instruments just being marketed. Instruments were graded from 1 (unacceptable) to 5 (exceptional) on performance, manufacturer's service, technical support, and reagents.

Except for carbohydrate analysis instruments, mass spectrometers, and multiple peptide synthesizers, there were statistically significant differences in instrument ranking in all categories. Instruments were summed by manufacturer, permitting overall evaluation of each major manufacturer in all four categories.

The number of each type of instrument and sum of instruments by manufacturer provided data on their extent of use in biotechnology core facilities. In 1995 in a typical facility there were 0.8 amino acid analyzers, 0.4 capillary electrophoresis instruments, 0.1 carbohydrate analyzers, 0.4 DNA sequencers, 0.9 DNA synthesizers, 0.6 mass spectrometers, 1.3 protein sequencers, and 1 peptide synthesizer. Compared to 1993 (Ivanetich et al., FASEB J. 7,1109), the number of instruments increased in all categories, with mass spectrometry showing the most striking (900%) increase. The number of instruments and number of facilities also increased by 200% for all except amino acid analysis (no increase) and mass spectrometry (450% increase). On average there were 5.8 instruments per facility, compared to 4.6 in 1993.


Return to the The ABRF Home Page


Created: 1st June 1996
Last modified: 1st June 1996