created: 11th December 1997, last updated: 11th December 1997,© 1997 ABRF

 

The Functional Analysis Of Genomes:

Recent Research In The Laboratory Of Dr. Ronald Davis

Ronald J. Sapolsky and Elizabeth A. Winzeler

Stanford University School of Medicine

As a director of a genome center producing millions of bases of DNA sequence annually, Dr. Ron Davis of Stanford University has always looked far beyond simple strings of A, C, G and T. His varied interests seek to address several basic questions. For what purpose will the sequence be used? Are there more effective ways to use the sequence data? What methods can be developed for adding functional information to the sequence data? How can these new methods harness novel massively parallel technologies of biological analysis?

Genomic sequence data does not exist in a vacuum. Genes encode proteins and other nucleic acid products which function in a complex environment both within and between cells. Sequence information outside genes may function as regulators of genes, as well as encode structural and regulatory elements for chromosomal maintenance and replication. Understanding the relationship between biological sequence information, the function of the biomolecules they encode, and the phenotype of the organism will provide a clearer and more comprehensive picture of life processes at the biochemical level.

However, even the simplest of cells may encode thousands of gene products, encompassing millions of gene and protein functional interactions. In the past, genes and their products have been studied one at a time or in simple systems both in vivo and in vitro. Such a piecemeal approach falls short in the effort to catalogue all the features as a whole in this colossal web of interacting cellular functions. Clearly, new methodologies to push the brunt of the analysis along parallel lines are required. In this review we discuss some of the research projects ongoing in the laboratory of Dr. Ron Davis, including the development of array-based technology as a tool for functional analysis of genomes in general, the functional analysis of the yeast genome and, finally, the construction of genetic maps for different plant species.

Array-Based Hybridization Strategies for Expression Analysis

The amount of messenger RNA encoded by a gene, and the time and circumstances under which the mRNA is produced, can provide information about the function of an uncharacterized gene identified in large-scale sequencing projects. For example, if expression of a gene is induced after exposure to a DNA-damaging agent, then one may make the reasonable assumption that the gene is involved in some aspect of DNA repair. The data can also provide information about potential drug targets. Unfortunately, new genes are being sequenced at a rate that far outpaces the rate by which each gene's expression can be monitored using traditional methods such as Northern blotting, especially when a large number of conditions are examined. To overcome this obstacle, several different technologies have emerged recently that allow gene expression to be monitored in parallel (1, 2, 3). The most promising of these is array-based hybridization (4, 5). In this massively parallel method, miniature ordered collections of nucleic acid probes are synthesized on or attached to a solid surface. These arrays can contain from 1000 up to 100,000 single or double-stranded probes ranging in length from 20 to several thousand nucleotides. After the array is manufactured, fluorescently labeled RNA or DNA is hybridized to the array. The hybridization is measured at each different probe location by scanning with a modified confocal microscope. Since the location of each probed molecule on the array is known, the expression level of the target RNA can be determined.

Mark Schena (a postdoctoral fellow in the laboratory of Dr. Davis) and Dari Shalon first demonstrated the use of ordered arrays of amplified cDNAs to probe gene expression in Arabidopsis thaliana in 1995 (3). Dr. Schena PCR-amplified cDNAs from an ordered Arabidopsis cDNA library and spotted each individual amplicon onto a glass microscope slide using the robotic microarraying device developed by Dr. Shalon, then a graduate student in the laboratory of Dr. Pat Brown (5). Messenger RNA was then obtained from one plant tissue and labeled with a fluorescent marker, while RNA from different tissue was labeled with a different fluorophore. The two samples were co-hybridized to the array. By measuring the amount of the two different fluorescent signals at each location on the array, they were able to determine which genes were differentially expressed between the two tissue samples.

Previously uncharacterized genes that are differentially regulated can be identified when cDNA from random libraries is incorporated into the microarray. (6). Using a 1046-element array of unknown composition, Dr. Schena hybridized RNA isolated from human T-cells that had been treated with heat shock or phorbol esters. The identity of differentially expressed genes was determined by sequencing those cDNAs showing differential hybridization. Both known and unknown genes were revealed by this assay. In another novel application of array-based technology, Renu Heller, now at Roche Bioscience, showed that array-associated expression profiles can be used to fingerprint different disease states. The mRNA expression patterns from tissues with different types of inflammatory disease were distinctive (7).

Complete Genome Expression Analysis in Yeast

While microarray technology offers an efficient method for studying gene expression in a large number of clones, its potential was only fully reached when the complete genome sequence of the yeast Saccharomyces cerevisiae was released in April, 1996. This compact 12 Mb genome, encoding about 6000 open reading frames (ORFs), was the first eukaryotic genome fully sequenced. By arraying the entire set of open reading frames encoded by the yeast genome, expression analysis could be carried to completion. Genetic regulatory networks might be revealed and all members of gene families that respond in similar fashion to a common environmental stimulus could be identified. It would also be possible to predict which genes might show false hybridization patterns because highly homologous regions of the genome would be known. Since Dr. Davis has had a long-standing interest in this model organism and contributed to the sequencing of its genome (8, 9), it was not surprising that he chose to devote considerable attention to the functional analysis of the Saccharomyces genome.

Though amplification and microarraying of DNA from cloned cDNA libraries can be accomplished by using primers to common vector regions, the arraying of the yeast genome, using the sequence information, presented a more formidable challenge. There are about 6000 open reading frames in the yeast genome and a set of 12,000 unique oligonucleotide primers would be required to systematically amplify in vitro every ORF directly from the yeast genome. The development of the AMOS 96-well oligonucleotide synthesizer (10) made this feasible. The automated multiplex oligonucleotide synthesizer (AMOS) was developed at the Stanford Genome Center to permit high-throughput synthesis of oligonucleotides at a fraction of the commercial price. Deval Lashkari and Joe DiRisi (of the Brown lab) designed primer pairs based on the newly acquired yeast sequence information that would permit the amplification of one-third of the ORFs. The primers were synthesized, used in amplification, and the PCR products were microarrayed (11). Then, the arrays were probed with fluorescently labeled RNA isolated from yeast cultured under a variety of growth conditions. By this method, genes were classified by function based on their patterns of expression (e.g., whether or not they showed induction after heat shock or cold shock).

While the yeast genome has now been completely arrayed using this microarraying method (J. DiRisi, personal communication), members of the Davis laboratory interested in full genome yeast expression analysis have begun using yeast high-density oligonucleotide arrays, designed at Affymetrix and synthesized by light-directed photolithography (12). These manufactured arrays contain collections of 20 or more 25-nucleotide oligonucleotide probes per annotated ORF in the yeast genome. Based on the genomic sequence, the probes were selected to be as unique as possible relative to each other and to have good hybridization properties (13). When fluorescently labeled RNA is hybridized to these arrays, the average signal intensity for all probes complementary to a given gene is reproducible and quantitative. Low-abundance yeast mRNAs, present at less than one copy per 10 cells, can be detected. Using this tool, laboratory members have been identifying all genes that are induced during sporulation, all genes that are differentially regulated in different yeast mutant strains, and genes that are transcriptionally silenced. There is also ongoing work to identify the decay rates for all mRNAs in the yeast genome.

Functional analysis of yeast deletion mutations

Despite decades of work, experimental data has been acquired for less than half of the genes encoded by the yeast genome. Characterizing gene expression patterns for each gene in the yeast genome is clearly a first step toward assigning function to sequence of the yeast genome. However, much more can be done. One classical approach for assigning function is mutational analysis. A gene is either inactivated (by insertions or deletions in the coding region) or rendered conditionally inactive. The resulting mutant is then examined under a number of different selective growth conditions to see how it differs phenotypically from the parent. For example, if a yeast gene-deletion strain shows increased sensitivity to UV light, one might hypothesize that the protein encoded by that gene is involved in DNA damage repair. In yeast, genes can be easily deleted or replaced. Gene replacement cassettes can be generated by PCR, requiring as little as 30 bases of flanking homology to target recombination to either side of the gene of interest. Since yeast has a finite set of about 6000 genes, one may thoroughly assess the function of gene sequences by generating a deletion strain for every annotated ORF, followed by examination of each strain under different selective conditions. One problem with this approach is that to obtain unique and useful data about the different deletion mutants, it will be necessary to examine gene function in as many specific growth conditions as possible. Testing each deletion separately under each condition would involve a task as time consuming as generating the different mutant strains.

Dan Shoemaker, a graduate student in the Davis lab, recently proposed a solution to this problem: pooling all the 6000 yeast deletion strains and testing the entire collection in parallel under different selective conditions (14). To distinguish each of the 6000 deletions, a 20-bp tag sequence is co-introduced with the replacement cassette as the deletion is generated. Each tag, which is designed to be maximally unique and to have good hybridization properties, serves as a strain identifier, allowing the deletion strains to be pooled and the selections to be performed en masse. To monitor the presence of the strains in the culture, the tags are PCR-amplified using fluorescent primers complementary to common sites flanking the tags in every construct. Then, the labeled amplicons are hybridized to a high-density oligonucleotide array containing the tag complements at defined positions. The resulting hybridization signal is quantitative and the amount of intensity reflects the abundance of a particular tag, and thus a particular deletion strain, in the population. In a test case examining deletions for eleven ADE and TRP auxotrophic markers, the strains were grown together as a pool in minimal media supplemented with everything but either adenine or tryptophan. With adenine omitted, the strains carrying deletions for genes in the adenine biosynthetic pathway disappeared from the pooled culture after 8 generations; when tryptophan was omitted, the TRP mutants disappeared (14).

A large project is underway to extend this 'deletion-tag' analysis to the entire genome. Tagged PCR cassettes are generated at Stanford using oligonucleotide primers that are designed and synthesized using the AMOS system. A group of 96 replacement cassettes, each designed to knockout a particular gene, are then sent to members in a consortium of eight remote laboratories. After the deletions are generated, data about the construction of each strain is entered into a database at Stanford over the World-Wide Web. The characterized and confirmed deletion strains are returned to Stanford where they will be analyzed in a pooled batch using the array-based technology. Gene function data and deletion strains will be made available to the greater yeast community at frequent intervals within the period of study. Since the inception of the project in early 1997, tagged replacement cassettes have been generated for about 1200 of the 6000 yeast genes and about 400 deletion strains have been made. The complete set of tagged deletion strains should be available to researchers in three years. Large scale analysis projects are already underway.

Protein-Protein interactions in Yeast

While knowing when a gene is expressed and what happens when a gene is inactivated may provide clues about a gene's function, this information may not be enough to determine its biological role. For example, a gene may be essential for growth and may be expressed at significant levels under all conditions examined. Another way to classify gene products is through their associations with other biochemically characterized proteins. For example, one might postulate that a protein interacting with proteins in the spliceosome might be involved in splicing itself. One method for identifying protein-protein interactions is the yeast two-hybrid system (15). However, one problem that is observed with the two hybrid system is that too many positives are identified and that often the screen is not carried to saturation. If efficient methods were developed for analyzing the output of two-hybrid screens, it would be worthwhile to screen every gene in the yeast genome for interaction with every other gene. Towards this end, Ray Cho, a graduate student with Dr. Davis, used the Affymetrix whole genome yeast arrays to characterize the nucleic acid output of a yeast two-hybrid screen. After performing a screen designed to identify proteins that interact with actin, positive clones were PCR-amplified, labeled and hybridized to the full genome Affymetrix chips. Many known actin-interactors, as well as novel genes, were identified in a few hours time (Ray Cho et al., submitted).

Functional Analysis of Non-coding regions: Identifying Replication Origins in Yeast

In addition to assigning function to genes in the sequence, function should be determined for relevant non-coding regions. Only about half of the yeast genome is used to encode protein. Non-coding features on yeast chromosomes are the origins of DNA replication. For example, yeast chromosomes contain origins of replication about every 25-50 kb (16). Despite decades of work, only a handful of the origins have been identified using laborious nonparallel methods. While all origins conserve an 11-base consensus sequence necessary for function, this sequence is not sufficient and other determinants have not been well characterized. Other unanswered questions include: what determines an origin's timing; what is the role of transcription in replication timing; what is the role of chromatin architecture in replication?

In collaboration with Affymetrix and the Brewer-Fangman group at the University of Washington, researchers in the Davis lab are mapping yeast origins of replication. Using a variation of the classic Meselson-Stahl experiment , it is possible to enrich for yeast origins of replication (17). Yeast cells can be grown in isotopically heavy media and arrested at the start of S phase. The cells are then transferred to isotopically light media and released from the cell cycle block. The light isotope is initially incorporated only at origins of replication and this newly replicated fraction can be isolated from the unreplicated DNA by density gradient centrifugation. The DNA fraction enriched for origin activity can be labeled with a fluorescent marker and hybridized to the yeast oligonucleotide high-density arrays. Preliminary data on the location of every origin in the yeast genome has been produced and is in good agreement with data obtained for chromosome V using conventional methods (16). Furthermore, these arrays should prove sensitive enough to detect twofold changes in gene copy number in the course of the cell cycle.

Genetic diversity and genome function.

Function can be assigned to sequence through the analysis of different wild isolates of an organism. Are there genes that are present only in strains adapted to living under specialized conditions? S. cerevisiae can be isolated from many different environments -- for example: the skin of a grape; a packet of bakers yeast at the grocery store; unfiltered beer; even from the lung of an AIDS patient. By analyzing the complete genotype of many different strains -- determining which genes they have in common and which genes are particular to one strain but not another -- may provide functional information about those genes. For example, by hybridizing labeled DNA from one strain of yeast to a microarray containing amplified genes from the sequenced strain, Lashkari et al. identified genes that were deleted in the test strain (11). Normally, S. cerevisiae is nonvirulent, but occasional pathogenic strains are observed which cause disease in immuno-compromised individuals (18, 19). By hybridizing DNA from these nonstandard strains to Affymetrix full yeast-genome arrays, a handful of genes have been identified that are missing in the pathogenic strains. The long-term goal is to map the genes required for virulence in the pathogenic clinical isolates and to determine whether these genes are missing or just inactive in the laboratory reference strain.

Genetic Mapping with High-Density Arrays:

In the absence of a wholly sequenced genome, genetic mapping remains a time-honored method to determine the associations between the physical locations of genes and function of genes as assayed as phenotypic traits in a population. Function in affected and unaffected individual organisms can be linkage mapped to physical and genetic markers distributed throughout the genome. Gel-based physical markers such as biallelic restriction enzyme site variations (as in RFLPs) and multiallelic simple repeats of sequence (as in VNTRs and in microsatellite DNA) have been used in both the earliest and more recent genetic maps of important experimental eukaryotes, such as man and mouse (20, 21). Most recently, attention has been given to single-nucleotide polymorphisms (SNPs): simple biallelic markers detected by comparative DNA sequencing of several individual genomes that exhibit base variation at a fixed location with a high minor allele frequency (e.g., >25%). Thus, phenotypic traits can be mapped in linkage or associative studies to distinctive alleles in a large, well-distributed, consistent-behaving set of SNPs. However, in order to locate genes involved with not only single-gene but multigenic traits, one would wish to map individuals against a very large set of thousands of SNPs in a fast, reliable and non-labor intensive (i.e., non-gel based) assay.

To achieve this goal, high-density oligonucleotide arrays may be used to detect the genotype of SNPs via the hybridization of labeled target, amplified from the genome using tight-flanking primers to the SNP. A small number of DNA probes (e.g., 20-40 probes), perfectly and imperfectly complementary to the sequence at and around the SNP base, may be used as a detection block to redundantly and thoroughly test which alleles are present in an individual. For two alleles A and B, the presence of A alone, B alone and both A and B together indicate, respectively, the genotypes AA, BB and AB for that SNP. With current high densities of oligonucleotide synthesis via photolithography, thousands of such SNP detection blocks can be synthesized within a (1.28-cm)2 area and assayed in a massively parallel manner. Eric Lander's group at MIT, in conjunction with Affymetrix, have begun to collect and test thousands of human SNPs for use in mapping biologically relevant genetic traits (22, 23). In a similar vein, Bertrand Lemieux, a York University professor visiting Stanford, has collected the sequences of hundreds of SNPs from Brassica (cabbage) and Maize (corn) for use in a DNA chip-based mapping system. Ron Sapolsky at the Stanford Genome Center, working with Dr. Lemieux and Affymetrix, has developed the labeling and hybridization assay for the genotypic detection and subsequent mapping of these plant genetic markers.

SNPs from higher eukaryotes may be determined by methods other than comparative DNA sequencing. Peter Oefner, of Stanford Genetics, has developed a selection technique for SNPs based on the separation of heteroduplexed DNA (due to the sequence variation present at the relevant allele) by high-pressure liquid chromatography (HPLC). Graduate student Ray Cho has purified genomic fragments on HPLC amplified from Arabidopsis thaliana, showing polymorphic variation. The fragments have been sequenced and the SNP sequences have been prepared for layout on an Affymetrix chip bearing detection blocks for these Arabidopsis genetic markers.

Acknowledgments

 

We thank members of the Davis group for permission to cite unpublished data. Elizabeth Winzeler was supported by the John Wasmuth Fellowship in genomic analysis (HG00185-01).

References

1. V. E. Velculescu, et al., Cell 88, 243-51 (1997).

2. P. Liang, A. B. Pardee, Science 257, 967-71 (1992).

3. M. Schena, D. Shalon, R. W. Davis, P. O. Brown, Science 270, 467-70 (1995).

4. A. C. Pease, et al., Proc Natl Acad Sci U S A 91, 5022-6 (1994).

5. D. Shalon, S. J. Smith, P. O. Brown, Genome Res 6, 639-45 (1996).

6. M. Schena, et al., Proc Natl Acad Sci U S A 93, 10614-9 (1996).

7. R. A. Heller, et al., Proc Natl Acad Sci U S A 94, 2150-5 (1997).

8. F. S. Dietrich, et al., Nature 387, 78-81 (1997).

9. H. Bussey, et al., Nature 387, 103-5 (1997).

10. D. A. Lashkari, S. P. Hunicke-Smith, R. M. Norgren, R. W. Davis, T. Brennan, Proc Natl Acad Sci U S A 92, 7912-5 (1995).

11. D. Lashkari, et al., Proc Natl Acad Sci U S A In press (1997).

12. L. Wodicka, H. Dong, M. Mittmann, M. Ho, D. Lockhart, Nature Biotechnology 15, 1359-67 (1997).

13. D. Lockhart, et al., Nature Biotechnology 14, 1675-1680 (1996).

14. D. D. Shoemaker, D. A. Lashkari, D. Morris, M. Mittmann, R. W. Davis, Nat Genet 14, 450-6 (1996).

15. S. Fields, O. Song, Nature 340, 245-6 (1989).

16. S. Tanaka, Y. Tanaka, K. Isono, Yeast 12, 101-13 (1996).

17. R. M. McCarroll, W. L. Fangman, Cell 54, 505-13 (1988).

18. J. H. McCusker, K. V. Clemons, D. A. Stevens, R. W. Davis, Infect Immun 62, 5447-55 (1994).

19. J. H. McCusker, K. V. Clemons, D. A. Stevens, R. W. Davis, Genetics 136, 1261-9 (1994).

20. E. S. Lander, Science 274, 536-9 (1996).

21. G. D. Schuler, et al., Science 274, 547-62 (1996).

22. S. Fodor, Science 277, 393-4 (1997).

23. E. Pennisi, Science 272, 1736-8 (1996).


Return to the The ABRF Home Page