submitted: 4/2/97, accepted: 4/8/97, last updated: 4/9/97, © 1997 ABRF

1997 ABRF DNA SEQUENCE RESEARCH COMMITTEE POSTER

Presented at

ABRF'97: Techniques at the Genome/Proteome Interface

Hyatt - Regency Hotel
Baltimore, Maryland
February 9 - 12, 1997

2ND ANNUAL ABRF DNA SEQUENCE RESEARCH COMMITTEE STUDY: EFFECTS OF DMSO, THERMOCYCLING AND EDITING ON A TEMPLATE WITH A 72% GC RICH AREA

Association of Biomolecular Resource Facilities (ABRF)
DNA Sequence Research Committee (DSRC)
9650 Rockville Pike, Bethesda, MD 20814

Pamela Scott Adams
Mary Kay Dolejsi
Susan Hardin
Doug McMinimy
Paul Morrison
John Rush
W. Alton Jones Cell Science Center, Lake Placid, NY
Fred Hutchinson Cancer Research Center, Seattle, WA
University of Houston, Houston, TX
The Jackson Laboratory, Bar Harbor, ME
Dana-Farber Cancer Institute, Boston, MA
Howard Hughes Medical Institute, Harvard Medical School Boston, MA


ABSTRACT

An unknown GC rich double stranded DNA template and a sample survey sheet were sent to 134 research facilities that offer DNA sequencing as a service. The objective was to evaluate whether DMSO and/or altered thermocycling conditions improved the sequence obtained from a GC rich template. The impact of data editing was assessed, since both edited and non edited data were requested. In addition, the types of machines, sequencing chemistry, sample cleanup, gel types and analysis software currently in use were examined. Correlations between these factors and the quality of sequence were analyzed. This test provided a "snapshot" of the overall level of competence in DNA sequencing in core laboratories.

INTRODUCTION

The DNA Sequence Research Committee's first annual study (1, 2) in 1996 was designed to assess member laboratories ability to sequence a moderately difficult double-stranded plasmid DNA sample containing a GC-rich insert while studying the effectiveness of protocols developed for GC-rich samples, changes in sequencing chemistry, performance according to sequencing hardware, the type of products used and the effect of editing. The study showed that gel length, manual data review and sequencing chemistry all influenced sequence accuracy and length. A few facilities submitted data +/- DMSO in the sequencing reactions. In most cases DMSO increased the length of correct sequence, however many of the top-ranked responses did not use DMSO. In several direct comparisons, it was shown that an average of 157 more bases of correct sequence were obtained by manual editing for this difficult template.
These results prompted the DNA Sequence Research Committee to initiate its second study in which three very specific questions were addressed:

METHOD

This year's test sample, the recombinant plasmid fr30, was provided by Ed Laufer, Olivia Orozco, and Cliff Tabin of the Department of Genetics at Harvard Medical School. This plasmid uses the Bluescript II SK phagemid as vector and contains a 2,770 basepair insert encoding the chicken lunatic fringe gene, which encodes an intracellular signaling molecule involved in many aspects of embryonic development. The clone was isolated by screening a lambda ZapII embryonic chick cDNA library with a human lunatic fringe analog.
For this study, the plasmid was prepared as a DNA sequencing template by two rounds of CsCl/ethidium bromide equilibrium centrifugation followed by isopropanol extraction and ethanol precipitation. The 20 microgram aliquots were dried in a SpeedVac. Samples, along with a sample survey, a floppy disk and a return envelope addressed to a third party, were mailed to 134 laboratories that offer DNA sequencing as a service. Participants were asked to sequence the unknown sample according to manufacturer's recommended conditions, +/- DMSO and/or altered thermocycling conditions. In addition, unedited and edited data for each condition was requested. Samples were sent on November 27, 1996. The deadline for the submission of data was January 3, 1997. Data received by January 10, 1997 was included in the analysis of the data. Data received after this date were included only in the rankings.
Databases contain the sequence of the Xenopus laevis homologue but do not yet contain the sequence of the chicken lunatic fringe gene. During the course of this study, we found three errors in the unpublished sequence of this clone and we used a corrected sequence for alignment against the study responses.

RESULTS

Forty eight facilities, using 49 different machines, returned 246 sequencing chromatograms, of which 230 were included in this analysis. As the responses arrived any information revealing the identity of the responder was removed. The preliminary results of the sequencing "test" on the template with a 72% GC rich area were tabulated by the committee members and some of the highlights which were presented at ABRF'97 in Baltimore, MD on February 9-12, 1997 are shown below.

THE "LUNATIC" SEQUENCE: The Lunatic fringe (3) Genbank accession No. is U91849.

Click on LUNATIC to see the 87 bases of 5' vector plus the first 1413 bases of Lunatic fringe which was used to analyze each submission.

PRELIMINARY FINDINGS

SUMMARY OF COMMON PARAMETERS:

96 % of respondents used an ABI machine. There were 2 submissions by Licor machines and 1 by an ABI Model 310. A breakdown by type of machine and length of gel can be seen below. 89% of the respondents used dye terminator sequencing chemistry, 96% used FS enzyme and 81% reported using the ABI Prism kit. 68 % are using "whole reactions", while 20% have switched to 1/2 reactions with 1 laboratory using 1/4 reactions. 72% are still using some type of column for cleanup of their reactions. Of those using columns for cleanup, 24% reuse the columns.

PARTICIPATION : The number, type and WTR (well to read) of sequencers which were used in this study .

RANKINGS: All sequences submitted by the participants were trimmed of approximately 50 bases at the 5' end to to eliminate any variables in this area, since we were only interested in laboratories' ability to sequence through GC rich areas. The sequences were then compared with the known sequence and the cumulative number of errors was determined for each sequence. Miscalls, insertions, deletions and N's were considered errors. No entry in an error column indicates that there were too many errors to count. The sequences were then sorted according to the number of errors in the 0-600 column, 0-400 column and lastly 0-200 column for all the types of machines which have a greater than 24 cm WTR. For the ABI 373A which has a 24 cm WTR ( well to read), the sequences were sorted by the 0-400 column and then the 0-200 column. The Code column shows the participants 4 digit identification number preceded by N = not edited or E = edited. The letters following the code are D = 5% DMSO only, AT = Altered Thermocycling conditions were used, DAT = DMSO plus AT. No letter indicates that standard manufacturer's conditions were used. One participant used AC = altered chemistry, which was reported as a buffer modification. 0.5 after a code indicates that "half" reactions were used. The type of machine and the WTR are indicated for each. In cases where there are more than one sequence with identical scores, the ranking order is random. TOP 10: The sequences were sorted by code number (lab) and then by the 0-600 error column. The best sequence from each lab was then selected. These 10 best sequences were then further analyzed out to 1000 bases and ranked according to the longest read with no errors. The first number in the 0-800 column shows the base at which the first error occurred. The second number shows the number of cumulative errors out to 800 bases. A third number, followed by a * indicates that no further data was collected beyond that point. The parameters provided by the labs are included. EFFECT OF REACTION CONDITIONS ON UNEDITED SEQUENCE ACCURACY : Analysis of ALL sequences, regardless of gel length (WTR), showing the effects of adding DMSO, using altered thermocycling conditions or a combination of both. The high number of errors in the 0-600 range may reflect the inclusion of the data from the 373A machines (24 cm WTR) which probably do not generate accurate sequence in the 400-600 base range. EFFECT OF REACTION CONDITIONS ON EDITED SEQUENCE ACCUARACY : Analysis of ALL sequences, regardless of gel length (WTR), showing the effects of adding DMSO, using altered thermocycling conditions or a combination of both. The high number of errors in the 0-600 range may reflect the inclusion of the data from the 373A machines (24 cm WTR) which probably do not generate accurate sequence in the 400-600 base range. EDITING SIGNIFICANTLY IMPROVES SEQUENCE ACCURACY : Analysis of ALL sequences, regardless of condition or gel length (WTR), showing the effects of editing. The high number of errors in the 0-600 range may reflect the inclusion of the data from the 373A machines (24 cm WTR) which probably do not generate accurate sequence in the 400-600 base range.

SAMPLE CHROMATOGRAM: An example of an unedited sequence showing the GC rich nature of this template and the benefits to be gained by manual editing. This sample was run under standard conditions using dye terminator chemistry on a 373S using a 48 cm (WTR) gel.

Note the N's - almost all of which could be called correctly by manual editing. This sequence ranked #106 in its nonedited version with 30 errors in the first 600 bases, while the edited version of the same sequence ranked in the top 10 sequences.

PRELIMINARY CONCLUSIONS

Our initial analysis of the 230 sequences received has generated some conclusions regarding reaction conditions and the effects of editing on a difficult template. Foremost is the effect of editing. Manual editing of the sequence data does improve, sometimes dramatically, the accuracy of a sequence. Most of the data returned used ABI instruments and software, and is reflective of that system's strengths and weaknesses. This test sequence included sequence stretches that are extremely difficult for the TaqFS dye terminator system to read correctly. However, the electropherogram patterns are reproducible and easily learned through experience, and the improvement in sequence accuracy due to editing clearly shows this. This is a distinct change from the earlier study by the ABRF NAC (4) on automated sequencing (especially for the Amplitaq Dye Terminators), which showed that editing did not always improve the accuracy, and in many instances decreased it. The number of labs that submitted both edited and non-edited data is large enough that we should be able to make a statistically significant statement on editing.
Conclusions regarding the effects of altered reaction conditions are very preliminary. When the data from all the laboratories are pooled, it seems apparent that DMSO can improve, sometimes significantly, the accuracy of the basecalling. This trend holds for all sequence intervals examined. Assessing the effectiveness of altered thermocycling is somewhat unclear due to the fact that different labs use different protocols. Further analysis of altered thermocycling will be conducted and incorporated into this study.
Looking at the rankings of the best sequences and the conditions used to generate them, there are a variety of conditions which can produce very good sequence. The ABI 373S, 377 and the Licor sequencers are all represented in the top 10, as are acrylamides from several vendors. Column cleanup does not seem essential to obtaining excellent sequence and columns which are cleaned and repacked can be as effective as single use columns. DMSO and/or altered thermocycling conditions, while very helpful for some, provided marginal improvement for others. As has been demonstrated before in many studies of this type - this is a multi-variable problem, and no one solution will work in all labs. Each laboratory must determine what works best for them in their environment.
It is hoped that the presentation of preliminary results from this study will prove helpful in rapidly assessing methods that may aid member laboratories in sequencing GC-rich templates. Subsequent analysis by type of machine, gel length, gel type, etc. may reveal stronger and/or new correlations.

ACKNOWLEDGEMENTS

REFERENCES

Send comments or suggestions to: Scottie Adams

Back to the Association of Biomolecular Resource Facilities (ABRF) Homepage

©1997 ABRF