The 1993 ABRF DNA Sequencing Study



Last year, the Nucleic Acids Research Committee distributed a sample of an "unknown" double-stranded DNA template to ABRF members for sequencing. Members were asked to sequence this template by either the dye-terminator or dye-primer method (or both), depending upon their preference, using their own M13 -21 primer. The committee requested that study participants submit their results after automatic base calling (unedited data) and after manual inspection and correction (edited data). This report is intended to let study participants compare their results with the unknown sequence (Figure 1) and to provide preliminary results.

Because only two facilities used instrumentation made by manufacturers other than Applied Biosystems, preliminary analysis of the results was restricted to the ABI data. The committee received 83 usable data sets from 44 laboratories. Before analysis by the committee, these data sets were divided into four groups according to the sequencing method used (dye-labeled primers or dye-labeled terminators) and the status of the data, unedited or edited.

  1  GAGCGCGCGT AATACGACTC ACTATAGGGC GAATTGGAGC TCCACCGCGG
 51  TGGCGGCCGC TCTAGAACTA GTGGATCCCA GAGTTCTAGG CATGTGTTAG
101  GCACTCAAAA AACATCTGCT AAATGAATTA ATAAATACAT GCCTTTCAAA
151  ATAGAAGATT TACTAAGTTC TGGGGAGAGA ACACTTTATT TCATATATTG
201  GTACAGAACT ATCAATATTT TAGAGCTATA AATTATTGGC AAAAAATGGT
251  GAAAAGTAGG GAATTTAGAA CAAGACCTTC TGAGTTCCAA CCCAGCACCA
301  TCCCTTATTA GGTATACAAT CTTGAGCAAA TGACTAAGCC TCTTTGTGCC
351  TCTGTTTTCC AGTTGACATA ATAGAAATGA TAATAATACC CACCTGGCCG
401  GGCGCGGTGG CTCACGCCTG TAATCCTAGC ACTTTGGGAG GCCGAGGCGG
451  GTAGATCACC TGAGGTCAGG AGTTCAAGAC CAGCCTGACC AACATGGAGA
501  AACCCCGTCT CTACTAAAAA TTCAAAATTA GCTGGGCGTG GTGGCGGGTG
551  CCTGTAATCC CAGCTTCTCG GGAGACTGAG GCAGGAGAAT CGCTTGAACC
601  CGGGAGGCAG AGGTTGCAGT GAGCCGAAAT CGTGCCATTG CACTCCAGTC
651  TGGGCAACAA GAGCGAAACT CCGTCTCAAA AAAAATAAAA ATTAATAAAA
701  ATAATACCAA CCTTACAGGA TAATTGTGAG AATTAACTGA ATCAATTCAT
751  CGAAAGCCCC TAGAGCAGTA CTTACCACTT AGTACCTACT AAATAAATCT
801  TAGCAGCTGT TATTAGCTCT GA
Figure 1. The sequence of the "unknown" DNA template.

Each data set was aligned with the known template sequence, and the total number of insertions, deletions, substitutions, and no-calls at each base position was tabulated for each of the four groups. An overview of the data is presented in Figures 2-5, which plot the correct assignments at each position in the sequence. The number of data sets for each group varied and is shown in the figure legends. Interpretation of the dye-primer results requires caution because of the relatively small number of submissions. The sample set for dye-terminator results was much larger because of the wider popularity of this method.

The first 50 bases were difficult for both dye-primer and dye-terminator methods (about 90% correct base calls). Between 51 and 100 bases, dye-terminator sequencing produced about 99% correct base calling, while dye-primer sequencing still produced only about 90% correct base calls. The poorer performance of the dye-primer method in this region can be attributed to fluorescent overload from the primer front. After this initial section, correct base calling improved to about 99% for both methods.

Manual editing of the dye-primer data significantly increased the length of accurate sequence. The average length for the unedited dye-primer data was about 380 bases but this improved to about 480 bases after editing. However, it was somewhat surprising to find that manual editing did not have a similar effect with dye-terminator sequencing. In this case, both unedited and edited data sets produced good results up to about 325 bases. Beyond 350 bases, base deletions became particularly troublesome in the dye-terminator data.

A paper describing the complete results of this study is being prepared. In addition, the Nucleic Acids Research Committee intends to analyze the submitted data individually to produce a relative ranking. However, quantitative evaluation of this seemingly simple problem is difficult, and the final report has been delayed by the need to develop new algorithms and software. In the meantime, we have mailed individual alignments of each sequence to all participants who could be identified. Any participants who have not yet received an alignment should contact Al Smith.

The Nucleic Acids Research Committee is also planning an oligonucleotide synthesis field test for 1995 and in addition to analysis by capillary electrophoresis, the effect of the oligonucleotide primer on the quality of DNA sequencing reactions will also be evaluated. This testing will further supplement the data we have already collected on the performance capabilities of automated instrumen-tation in the real world.

Acknowledgment

The Nucleic Acids Research Committee would like to thank Dr. Eric Westin of Virginia Commonwealth University for making this sequence available prior to publication, the Bioinformatics Group, Center For Biotechnology, at the St. Jude Children's Research Hospital in Memphis for custom software and sequence alignments, and all the laboratories who participated in this study.


Return to the The ABRF Home Page


Created: 27th July 1995
Last modified: 27th July 1995