Created: 19th June 1998, last updated: 19th June 1998, © 1998 ABRF

TIPS Articles

Octamer Sequencing Technology :
Optimization Using Fluorescent Chemistry

Leslie B. Jones and Susan H. Hardin

Department of Biology and Biochemistry, and Institute of Molecular Biology,
University of Houston, Houston, TX


Introduction

A primer-walking sequencing strategy is commonly used to determine DNA sequence information. In this strategy a primer is designed from a known sequence, synthesized, and used to extend sequence information into a previously unknown region. These procedures are repeated until the entire sequence of interest is determined. The advantage of this strategy is that it is very methodical. However, the disadvantages are that it is not cost efficient (the synthesis of each oligonucleotide produces ~1000 fold excess of primer), and it is relatively slow (each new primer must be designed and synthesized). These disadvantages would be eliminated by the availability of a comprehensive primer library. The availability of a primer library simultaneously minimizes primer waste, since each library member can be used to prime multiple reactions, and allows immediate access to the next appropriate primer to extend the sequence information. Primer-walking DNA sequencing using primer libraries comprised of 8-, 9-, or 10-mers (1), or a subset of octamer or nonamers have been proposed (2, 3).

In a previous report, we detailed the initial development of isotopic Octamer Sequencing Technology (OST) (4). Octamer oligonucleotides were identified as the shortest length primer that produced cycle sequencing data and provided specificity in a sequencing reaction. We determined that priming efficiency was independent of both base order and the presence of a 'GC clamp' at the 3' end of the primer; this enabled us to design a primer library comprised of frequently occurring octamer primers, rather than one constrained by sequence order considerations. The library was optimized to sequence cDNA's by envisioning octamer positions 1-3 and 4-6 as adjoining 'codons', and positions 7-8 as the first two bases of a third codon. Codon frequency information was used to eliminate primers from the library that 'encoded' infrequently used codons. This design produced a 1132 member library with enhanced coverage for sequencing cDNA templates, yet retained exceptional coverage for sequencing genomic templates.

A pilot project using 32P resulted in an 86% reaction success rate (4), but it is important to note that if the signal from an isotopic reaction was weak, it was re-exposed for a longer period of time to generate high-quality sequence data. This report details the transfer of OST from isotopic cycle sequencing to fluorescent dye-terminator cycle sequencing on an ABI PRISM 377 Automated Sequencer.

 

Methods

Sequencing Methods. Automated sequencing reactions were performed using 4 µL of ABI PRISMTM Dye Terminator Cycle Sequencing Ready Reaction Kit Premix (or 2 µL of ABI BigDye Terminator Premix) containing AmpliTaq DNA Polymerase, FS, according to the manufacturer's directions, except that 12.5 to 50 pmol of a 75% GC octamer and 200 ng of dsDNA template were added into a 10-µL reaction (25 to 50 pmol octamer for optimal results). The reactions were cycled on a GeneAmp PCR System 9600 as per manufacturer's instructions, except that they were annealed at the indicated temperature (from 20oC to 60oC, with 40oC identified as the optimal annealing temperature) for one minute, and they were cycled for 99 rounds. Sequencing reactions were ethanol precipitated, pellets were resuspended in 3.5 µL of loading buffer, 1.5 µL was loaded onto a sequencing gel, and the data was collected by an ABI PRISM 377 DNA Sequencer.

Library Design and Analysis. The library was designed as described (4), except that the starting population of primers contained 75% GC octamers (7,168 sequences). We developed procedures for tracking and screening individual octamers using Sequencher 3.0 (GeneCodes, Inc.) and the Wisconsin Package (Genetics Computer Group) analysis software. A database consisting of 485 complementary octamer sequences (970 octamer primers) was constructed and individual octamer sequences within the library were aligned with various DNA sequences. Using Sequencher, the location of octamers identified by this comparison was graphically presented on the sequence, using alignment settings that require a 100% match and an 8-base overlap. If an octamer primer was present more than one time in a sequence, Sequencher displayed the position of the first match. Thus, the Wordsearch program (Wisconsin Package) was used for the analysis of all template DNA sequences, because it was critical to identify all matches in the sequence. After the positions of the octamer matches were identified, the information was exported to a Microsoft Excel spreadsheet for further analysis.

 

Results

Of the 14 primers from the original 50% GC primer library that were assayed for their ability to prime fluorescent dye-terminator cycle sequencing reactions, only 5 produced sequence information, resulting in an unacceptable 35.7% reaction success rate. To increase this rate, the impact of the following parameters were examined: increasing the primer GC content, increasing the number of times the reactions were cycled, increasing the amount of primer added to the reaction, and altering the annealing temperature. Most of these changes improved the reaction success rate and their analysis is detailed below.

Increase G/C Content to 75% The GC content of the primer was increased from 50% to 75% to increase the stability of the primer-template duplex. The effect of this change was assayed by sequencing two human cDNA projects, the 'X' and 'M' sequencing projects (Table 1). GC-rich octamers were designed using a modified primer-walking method: Newly determined sequence information was scanned for the occurrence of candidate 75% GC octamers; the primer approximately 75 bases from the 3' end of the sequence read was identified, synthesized, and used to extend the sequence information. This process was continued until double-stranded sequence information was obtained. This change improved the success rate of OST to approximately 74%, essentially doubling the rate observed using 50% GC octamers.

Table 1. Initial sequencing projects completed using octamer primers and fluorescent dye-terminator chemistry.

M Project* X Project

Primer

Name

Primer

Sequence

#Bases

Read

Collection

Mode (hr)

Library

Primer #

Primer

Name

Primer

Sequence

#Bases

Read

Collection

Mode (hr)

Library

Primer #

For CGA CGG CC 453 3.5 - For CGA CGG CC 513 3.5 -
For-H CGA CGG CC 668 8.0 - Rev 5' GGA AAC AG 498 3.5 -
Rev 5' GGA AAC AG 454 3.5 - 6-275X-H GGA GGA GC 559 8.0 245
6-130M CTG CCC TG 665 7.0 - 6-439X-H GCT CCT CC 653 8.0 739
6-151M CTC CCT GG 674 7.0 466 6-81X CCT GGC TC 520 3.5 360
6-151M-H CTC CCT GG 711 8.0 466 6-81X-H CCT GGC TC 531 8.0 360
6-294M CAG GGC AG 642 7.0 - 6-245X GAG CCA GG 643 8.0 854
6-315M CCA GGG AG 724 7.0 960 6-150X CTC CCT CC 650 7.0 355
6-315M-H CCA GGG AG 632 8.0 960 6-387X CAG AGC CC 601 7.0 -
6-203M GGG ACC TG 650 8.0 480 6-35X GCC AGC TC 304 3.5 780
6-10MR GCC ACA GG 450 3.5 859 6-35X-H GCC AGC TC 441 8.0 780
6-150X CTC CCT CC 498 7.0 355 6-340X CCC ATC CC 654 8.0 -
6-85M GCC TCC AG 498 7.0 329 6-50X GAG GGG CT 550 7.0 -
6-21M GCT GGC AC 655 8.0 - 6-214X AGC CCC TC 652 8.0 -
6-11MF GAG TCC CC 616 7.0 664 6-273X GCT GTG CC 602 8.0 733
6-30MF GTC CCC TG 413 3.5 - 6-248X GGG CCT GT 0 7.0 -
6-15/10M CCT GTG GC 0 7.0 365 6-88X CTG GTG GG 0 7.0 929
6-33MR GGG TCC TG 0 3.5 207 6-266X GCT TCC CC 0 3.5 187
6-310M CAG GAC CC 0 8.0 701 6-223X GGG CTC TG 0 7.0 -
6-104M CTC CCC AG 0 8.0 - 6-176X GGG ATG GG 0 8.0 -
M Project TOTAL bases: 9,403 (1,578 insert) X Project TOTAL bases: 8,371 (2,157 insert)

Two human cDNA clones containing inserts of 1,578 bases (M Project) or 2,157 bases (X Project) were completely sequenced using octamer primers: thirty 75% GC primers, one 87.5% GC primer, and one 50% GC primer. The primer names and sequences are shown. If a primer name contains an- '-H' suffix, the sequencing reaction was performed in half volume (10 µl). The number of bases read using the indicated length of data collection are displayed. A primer identification number is indicated if the primer is included in the 75% GC optimized octamer library. A dash (-) indicates that the primer was not included in the optimized library. Note that primer 6-150X (library primer #355) produced sequence information in both sequencing projects.

This improved reaction success rate prompted the design of a 75% GC octamer library. Again, observing that no particular sequence design motif was associated with a successful octamer-primed reaction (Table 1), the criteria outlined above were used to design the 75% GC library. Not all of the primers used to complete the X and M projects were included in the 75% GC octamer library (Table 1). Although we predict that any octamer primer composed of 75% GC will produce high quality data using fluorescent dye-terminator OST methods, the primers included in the 75% GC octamer library are predicted to offer increased flexibility in primer choice. For example, the coverage of the 75% GC library was analyzed on the human p53 genomic and coding sequences (P. Chumakow, V.P. Almazov, and J.R. Jenkins, GenBank accession no. X54156). The 970 member, 75% GC octamer library produced closer primer spacing in both genomic (33+36 bases) and coding sequences (18+18 bases), compared to primer spacing in genomic (39+34 bases) and coding sequences (38+34 bases) for the original 1132 member, 50% GC library (Table 2; Ref. 4). This improved primer spacing increases flexibility in choosing the next octamer to extend the sequence information.

The largest gaps between primers for the new library were 418 bases and 123 bases in genomic and coding sequences, respectively (Table 2), compared to 227 bases and 204 bases for the 50% GC octamer library (4). Although this genomic region is 50% GC overall, it contains AT-rich regions; and, in fact, the 418-base gap between primers contains AT-rich DNA. These gaps are distances between primer pairs and, since primers in the library have complements to facilitate double-stranded sequence determination, each gap should be covered in a typical sequencing run.

Table 2. Coverage analysis of 75% GC optimized octamer primer library.

Target DNA* Base Pairs %GC Avg Dist Largest Gap
p53 genomic 20,303 50 33+36 418 bp
p53 coding 1,182 57 18+18 123 bp

*Summary of 75% GC octamer library on the p53 genomic and coding sequences. The number of base pairs and the percent GC composition in both the genomic and coding regions of the p53 DNA sequences are shown. To indicate variability in distance between primer pairs, the average (Avg Dist) distances between primer hits in these DNAs are shown with the average of the absolute deviations of these distances from their mean. The largest gap between pairs of primers is indicated for each DNA.

Increasing the Number of Cycles and the Amount of Octamer To optimize OST using the 75% GC primer library, signal intensity and data accuracy of reactions subjected to extended cycling regimes or increasing amounts of octamer primer were examined. Samples were examined after 25, 50, or 99 reaction cycles using either 12.5, 25, or 50 pmol of octamer primer (Figure 1). A linear increase in signal intensity was observed with increased cycling (Figure 1B), suggesting that the DNA polymerase was synthesizing DNA throughout the extensive cycling regime. Given this linear increase in signal intensity, the increased read length, and data accuracy after 99 cycles (Figure 1C), and our goal of having each octamer produce robust sequence data, all subsequent reactions were standardized to this extended cycling regime.

Figure 1. Analysis of the effect of increased cyclings and primer amounts on data generated in octamer sequencing reactions using ABI PRISM dye-terminator cycle sequencing chemistry and AmpliTaq DNA polymerase, FS. A double-stranded DNA vector containing a 2,157-bp human cDNA insert (the 'X' project) served as the sequencing template. Sequence data was collected by an ABI PRISM 377 Automated Sequencer using a 3.5-hr collection mode. (A) Shown are DNA sequencing ladders produced by octamer #780 (5'-GCCAGCTC-3') using a 40oC annealing temperature, the indicated number of cycles, and the indicated amount of primer. (B) Graphical representation of ABI reaction signal strength (summary of A, C, G, T values). (C) Graphical representation of UNEDITED data accuracy. (Left y-axis) The black bars represent the number of errors contained within 400 bases (beginning 63 bases from the 3' end of primer #780). Errors were identified by comparing the newly determined sequence data with existing double-stranded sequence information. The white bars represent the number of ambiguous bases (N's) present in the UNEDITED sequence information. (Right y-axis) The line connects data points that correspond to the length of high quality sequence information associated with the indicated reaction condition.

Larger version of Figure 1 (150K gif)

 

 

 

 

 

 

 

 

 

 

Unedited data accuracy was 99% - 99.5% over 400 bases at the assayed primer concentrations (Figure 1). Further increasing the amount of primer had little effect on the intensity or accuracy of the reaction (data not shown). The amount of primer that routinely produced high quality data for a variety of different octamers was 25 pmol per 10 µL of reaction.

Optimal Octamer Annealing Temperature The ability to process all sequencing reactions in parallel is an important aspect of OST. Therefore, it was important to identify the optimal temperature cycling profile at which all octamers efficiently and specifically prime a reaction. Reactions were assembled and cycled using either 20oC, 30oC, 35oC, 40oC, 45oC, 50oC, 55oC, or 60oC as the annealing temperature, loaded onto an automated sequencer, and analyzed for signal intensity (Figure 2B) and data accuracy (Figure 2C).

 

Figure 2. Analysis of the effect of altered annealing temperatures on sequence intensity and data quality. (A) Octamers were used to prime dye-terminator cycle sequencing chemistry from a double-stranded human cDNA sequencing template. Octamers #780 and #355 produced sequence data from the 'X' project; however, octamer #355 was also used as a sequencing primer in the 'M' human cDNA project. (B) Graphical representation of ABI reaction signal strength (summary of A, C, G, T values). (C) Graphical representation of UNEDITED data accuracy. Sequence data was collected by an ABI PRISM 377 Automated Sequencer using a 3.5-hr collection mode. The black bars represent the number of errors contained within 400 bases (beginning either 45 bases from the 3' end of octamer #780 or 63 bases from the 3' end of octamer #355). Errors were identified by comparing the newly determined sequence data with existing double-stranded sequence information. The white bars represent the number of N's present in the UNEDITED sequence information. (D) Chromatograms generated from reactions cycled using the 40oC annealing temperature with 25 pmol of octamer primer #780 (top panel) or #355 (bottom panel). The highlighted bases are 245 bases (#780) or 263 bases (#355) from the 3' ends of the primers.

Larger version of Figure 2 (400K gif)

 

 

 

 

 

 

 

 

 

The number of base errors and the number of ambiguous bases (N's) for these reactions indicate that octamers are able to prime reactions reasonably efficiently at each temperature assayed (Figure 2C). Even at 60oC, the highest temperature assayed, DNA sequence information was produced. Initially, this was surprising, because 60oC was above the calculated melting temperature for these primers. These results may be explained if the enzyme stabilized the primer-template duplex to promote extension from the primer. Nevertheless, data accuracy (base errors and ambiguous base calls) at this elevated temperature was not optimal. Likewise, increased misincorporations or ambiguous bases were observed using reduced annealing temperatures (20oC and 30oC), and suggests that additional factors, possibly including reduced fidelity of incorporation or increased priming from secondary sites, contributed to a reduction in data accuracy (Figure 2).

The analysis of primer #780 (5'-GCCAGCTC-3') was of particular interest (Figure 2, Left) because two 7/8 matches with this primer occur in the human X cDNA template. These closely matched sequences, 5'-ACCAGCTC-3' and 5'-TCCAGCTC-3', have a single mis-match with #780 at the 5' position, yet their presence in the template did not prevent the octamer from preferentially priming the reaction from the perfectly matched target site at each temperature. However, data accuracy using this primer was highest at the 40oC annealing temperature (99% over 400 bases), and declined if the annealing temperature was either decreased or increased (Figure 2C, Left). We suspect that complementary factors may account for this observation: at lower temperatures the increased duration of primer-template association at a secondary binding site (< 7/8 matched sites) compensated for the decreased polymerase activity, whereas at elevated temperatures the increased polymerase activity compensated for the decreased duration of primer-template association at a secondary binding site. These observations, coupled with others (6), were used to identify 40oC as the optimal annealing temperature for all subsequent OST reactions. Chromatograms generated from reactions cycled using the 40oC annealing temperature with 25 pmol of either octamer primer #780 or #355 are shown (Figure 2D).

Using the 75% Octamer Library to Sequence a 5-kb Genomic Template Double-stranded sequence information was determined for the 4,993-bp upstream region of the Drosophila melanogaster timeless gene ('Re2' project) using 39 members of the 75% GC octamer library and 6 non-library primers (7). The success rate for the OST reactions used to complete the Re2 project was approximately 73% (40 OST successes/55 OST reactions). We observed a correlation between the inability of a primer to produce sequence information and an increased potential for hairpin formation within the 100 bases surrounding the octamer priming site (7). Therefore, to minimize reaction failure, template DNA's should be analyzed to identify potential hairpin structures. Candidate octamers located within 20 bases of predicted hairpins (stronger than -5 Kcal/mol) should not be chosen.

 

Conclusions

Conditions are identified in which 75% GC octamers prime fluorescent dye-terminator cycle sequencing reactions. A library containing 970 octamer primers provides sufficient coverage to sequence both genomic and cDNA templates. Given the current success rate of approximately 75%, the spacing between successful primings on the template might be expected to increase from 33+36 bases to approximately 44+48 bases on genomic templates and from 18+18 bases to approximately 24+24 bases in coding region templates. OST works most efficiently in conjunction with the octamer library; however, the library is not required to perform an octamer-primed sequencing reaction. The successful incorporation of fluorescent dye-terminator sequencing chemistry into the OST sequencing strategy, coupled with the design of and access to the 75% GC octamer library, are critical factors that make possible a rapid, cost-effective "closed-loop" DNA sequencing system.

 

Acknowledgments

The authors thank Drs. Paul Hardin and Bill Widger for careful reading of the manuscript and Zhong Chen for computer assistance. Supported by NIH grant R29-HG01151 to S.H.H. and the Department of Biology and Biochemistry at the University of Houston.

 

References

1. Studier, FW. A strategy for high-volume sequencing of cosmid DNAs: random and directed priming with a library of oligonucleotides. Proc. Natl. Acad. Sci. USA. 1989;86:6917-6921.

2. Siemieniak, DR and Slightom JL. A library of 3342 useful nonamer primers for genome sequencing. Gene. 1990;96:121-124.

3. Burbelo, PD and Iadarola MJ. Rapid plasmid DNA sequencing with multiple octamer primers. BioTechniques. 1994;16:645-650.

4. Hardin, SH, Jones LB, Homayouni R, McCollum JL. Octamer-primed cycle sequencing: design of an optimized primer library. Genome Research. 1996;6:545-550.

5. Chumakow, PM, Almazov VP, Jenkins JR. 1990. GenBank accession X54156.

6. Ying, J, Bradley RK, Jones LB, Colbert DT, Smalley RE, and Hardin SH. DNA replication templates stabilized by guanine quartets. 1998; (submitted).

7. Jones, LB. and Hardin SH. Octamer-primed cycle sequencing using dye-terminator chemistry. Nuc. Acids Research. 1998; (in press).


Return to the ABRF Home Page