ABRF WORKSHOP - DNA Sequencing



Betsy Nanthakumar, Genetics Resources Core Facility (GRCF), The Johns Hopkins University.


With the advent of automated DNA sequencers the number of DNA sequencing facilities has grown tremendously. However, DNA sequencing technology is the only major core facility service which is enzyme based. This introduces a whole new level of problems which can occur with this type of service. The purpose of this workshop was to discuss some problem areas in the day to day operation of a DNA sequencing facility and to present some solutions to these problems. The items discussed included quality of plasmid and PCR templates, sequencing primers, sequence analysis and data management. In addition, Sandy Spurgeon of Applied Biosystems presented work done on sequencing difficult templates.

The DNA sequencing service at the GRCF is currently providing services to over 200 principal investigators and utilizes three Applied Biosystems 373A DNA sequencers. We are primarily using the Taq dye-terminator cycle sequencing chemistry with custom unlabeled sequencing primers made in the GRCF. We have found that Taq is a much less forgiving enzyme than those commonly used for manual sequencing (i.e. T7 polymerase) and hence the quality of DNA templates and primers presented for sequencing is of utmost importance. Templates are provided to the GRCF by the individual labs which use the service and we have had to spend a considerable amount of time advising users on template preparation. However, everyone concerned (CORE staff and users) usually agree that the time is well spent. The information presented below represents what we have found at the GRCF to give the most consistent results in our service.

Quality of Plasmid Templates.

Problems associated with plasmid DNA templates seem to fall into two main categories - DNA quantitation and contamination with different agents. DNA that is poorly quantitated can give two types of sequencing results. Too much DNA can result in a weak signal whereas too little DNA may result in no signal at all. The range of template concentration over which Taq polymerase works optimally if relatively narrow (2-3 fold). We routinely require that plasmid templates be quantitated by absorbance measurements at 260 nm before the template is submitted. Trying to quantitate DNA concentration by ethidium staining on an agarose gel has proven to be inconsistent and may be incorrect by several fold.

Contamination of plasmid templates with RNA or protein can affect the sequencing results adversely (partly by incorrect quantitation). We recommend that an absorbance ratio (260/280 nm) be determined before submission of the sample and that the 260/280 ratio be equal to 1.8. Reagents commonly used for DNA preparation may be detrimental to sequencing depending on the amount of effort spent to remove these contaminants. We have seen problems with phenol, CsCl and PEG. It should be noted that DNA preparation methods that use these agents can yield quality template for sequencing, however, we have found that they can be very inconsistent depending on the user. Therefore, we strongly recommend the use of Qiagen (Qiagen Inc.) columns for template preparation (mini columns seem to give the most consistent results) to our users who are having problems. These columns do not require the use of organics and routinely give quality DNA when used properly. With some users we have also observed good results with other similar preparation methods, such as the Magic Mini prep (Promega). Finally, contamination with EDTA will result in problems with Taq sequencing reactions since Taq DNA polymerase has a requirement for magnesium for optimal activity. We have observed problems with DNA templates in 10 mM Tris/HCl, pH 8, 0.1 mM EDTA (TE) buffer. Since this is the most common buffer used to resuspend DNA it can pose quite a problem with your users. We suggest that DNA submitted for sequencing be resuspended in sterile water or T(1/10)E.

Quality of PCR Templates.

The ability to directly sequence double-stranded PCR products has been greatly enhanced by the Taq cycle sequencing method. Approximately 40-50% of the templates submitted to the GRCF for sequencing are PCR products. The major problem we have observed with directly sequencing PCR products is the presence of primer-dimers. These small side reaction products are sometimes not visible by ethidium staining on agarose gels but their presence is soon known from the results of a sequencing reaction. This is apparent by a very strong signal for approximately 50 to 60 bases (due to sequencing primer dimers) followed by a drastic drop off of signal. The small primer dimer products appear to be preferentially used as template in the sequencing reaction. To prevent this problem, we ask that PCR products be isolated by gel electrophoresis in order to separate them from the primer-dimer products. HPLC is another method which works well but has the disadvantage of requiring more time. We have run into consistency problems with glass bead products (such as Gene Clean (Bio 101 ) or Qiaex (Qiagen Inc.)) used to isolate DNA from agarose gels with our users. The main problem is residual NaI which interferes with the sequencing reaction. Hence, we usually recommend that the isolated product (in a gel slice) be brought to the CORE and we then perform an electroelution to extract the DNA for sequencing.

Sequencing primers.

The quality of the primers used for DNA sequencing is just as important as the quality of the DNA template. We have seen three main problems with sequencing primers. First, dimer or hairpin formation within a primer may affect your sequencing results especially if the 3' end is involved. We employ a primer design program (Oligo - National Biosciences) which predicts these structures when designing sequencing primers. In cycle sequencing reactions the Tm of the primer is very important. Since the sequencing reactions never go below 50 degrees C, we recommend that primers used for sequencing do not have Tm's below 50 degrees C. The final problem we have observed is that traces of ammonium hydroxide can interfere with sequencing reactions. Since this is the reagent used for cleavage and deprotection of synthetic oligonucleotides it needs to be removed before the primer is used in a sequencing reaction. We desalt our primers through G25 columns to remove NH40H and other trace impurities resulting from cleavage and deprotection.

In addition, with users who are having problems with primers either made in their lab or from other facilities, we recommend that primers be designed, analyzed and made in the CORE facility.

Sequence Analysis and Data management.

The DNA sequencing service at the GRCF also provides sequence analysis, contig assembly, and editing services to our users. We have found that this can be a major bottleneck in our turnaround t. especially contig assembly. There are only a few programs available for these applications which also allow you to directly view the automated sequence chromatogram data on the screen. We are currently using two such programs for these tasks. We have found that Seqed (Applied Biosystems) is very useful in looking for point mutations and heterozygote analysis from PCR products. For contig assembly, we are currently using a new program on the market - Sequencher (Gene Codes). With this program we successfully assembled a 24.5 Kb genomic DNA sequence from 18 overlapping clones (227 sequencing runs) into a preliminary consensus sequence in a single contig in 90 minutes. Editing disagreeing and ambiguous bases took about four additional hours. This same project had taken weeks to assemble in Seqed in several overlapping contigs. Both programs run on a Macintosh computer and require no other external hardware. The Inherit System (Applied Biosystems) is another good choice for sequence assembly and analysis but has the disadvantage of requiring a Sparc station and is also extremely costly making it out of the reach of small core facilities.

Once a sequencing facility is up and running it becomes quickly apparent that data management can be an overwhelming task. A relational database which we have programmed at the GRCF using the Macintosh based program Fourth Dimension (ACI US) was presented. This database allows us to keep track of DNA sequencing records and billing as well as DNA synthesis, microsatellite analysis and miscellaneous service records.

Sequencing difficult templates (Sandy Spurgeon, ABI).

The quality of automated sequence data can be affected by many factors such as the purity and sequence of the primer, the quality and quantity of template and aspects of technique in preparing the samples. In addition, the actual sequence of the template can affect the final result. Templates which have long homopolymer regions, very AT rich or GC rich base compositions or very AT or GC rich regions, regions of strong secondary structure or containing long repeats can all be difficult to sequence. In this talk a variety of approaches to solving problems with difficult templates were discussed. With cDNA clones that have long homopolymer A regions in the template, noisy data may be observed after the homopolymer region, particularly when sequencing with Taq. We have used primers that are complementary to the homopolymer region, with a mixture of three bases on the 3' end to "anchor" the primer at the junction, to obtain unambiguous data beyond such a region. Templates with a very GC rich base composition ( greater than 65%) frequently give very weak signals when sequenced with Taq terminators under standard conditions. By increasing the denaturation temperature to 98 degrees C increasing the amount of enzyme from 4 to 8-16 units, increasing the number of cycles to 30 and using NEB Vent polymerase buffer such GC rich templates could be sequenced successfully. Another modification that was useful with some templates was to add 5% DMSO  to the reaction, with a 95 degree C denaturation temperature. These modifications were also useful for sequencing templates with regions of strong secondary structure. Templates with short di- and trinucleotide repeats (6-10 repeats) are not generally difficult, except for very GC rich repeats. However, it can be difficult to sequence longer repeats that involve hundreds of bases. Finally, some templates may yield a better result if a different sequencing chemistry is used such as Taq dye primers or Sequenase dye terminators.

This work was done at Applied Biosystems by Margaret Galvin and Sandy Spurgeon using templates provided by 373 users.


Return to the The ABRF Home Page


Created: 11th September 1995
Last modified: 11th September 1995