Darrell J. Doyle The Institute for Genomic Research
The Eighth International Genome Sequencing and Analysis Conference was held at Hilton Head Island, SC from October 5-8, 1996. This meeting marked a significant event in genome sciences. For the first time, complete DNA sequences were available for representatives from all three biological kingdoms: genome sequences for the prokaryotes Hemophilus influenzae, Mycoplasma genitalium, Helicobacter pylori, and the unicellular blue-green cyanobacterium Synechocystis; the archaeon Methanococ-cus jannaschii; and the eukaryote Saccharomyces cerevisiae have been completed and are deposited in the public databases. 4.475 Mb of the 4.65 Mb Escherechia coli genome was available at the time of the meeting.
In the near future, complete DNA sequences are expected for other bacteria including Treponema pallidum, Streptococcus, and Neisseria, and other Archaea including Sulfolobus and Archaeglobus fulgidis. The complete genomic sequencing of the organisms causing tuberculosis and malaria and of other obligate intracellular parasites like Rickettsia and Chlamydia is progressing at a significant pace. The Rickettsia project is of great interest because this organism is on the same phylogenetic branch as the primitive prokaryotic ancestor to mitochondria.
The complete genome sequences for important model organisms to study complex biological processes such as cell-cell interactions and development are also expected in the near future. Over 40% of the 100 Mb Caenorhabditis elegans genome sequence is now known, and the entire sequence is expected in 1998. Similarly, the sequence of the 120 Mb genome of Drosophila melanogaster is expected by the year 2001. Commitments have been made toward obtaining the complete genomic sequence of the plant Arabidopsis and mammals, including mice and humans.
This meeting marked a turning point in genomic research: with the availability of several complete genome sequences, future emphasis is likely to shift from high -throughput complete DNA sequencing to a systematic study of gene productstheir structure, function, regulation, evolution, integration in biological pathways, and comparative analysis.
At this meeting the entire 3,573,470 bp sequence of the Synechocystis sp. strain PCC6803 was reported. This blue-green unicellular cyanobacterium is capable of oxygenic photosynthesis and, like chloroplasts, has thylalkoid membranes. Its genome contains 3,168 open reading frames (orfs). Of these, only 145 had been previously identified, 126 were related to genes already known to have photosynthetic functions, another 1,597 were similar to other known and hypothetical genes, leaving 1,426 orfs (45% of the total) with no similarity to any sequences in the databases.
Helicobacter pylori is a flagellated, pathogenic bacterium that may be the major cause of peptic ulcers. Analysis of its genome sequence revealed the presence of a family of 11 orfs containing sequences with motifs for secretory proteins not present in the databases. Interestingly, this analysis may have identified a transport system that conceivably could function to export pathogenic molecules.
Arabidopsis thaliana is a model plant species with more than 100 Mb of DNA organized on 5 chromosomes. In a pilot project, a group of 21 laboratories of the European Union have sequenced 1.5 Mb of the 17.5 Mb chromosome 4 and about 0.5 Mb of other regions of the genome. To begin the complete sequencing of all five chromosomes, the number of European Union laboratories working on the Arabidopsis project will be expanded and integrated with programs in the United States and Japan.
In contrast to the genome of Arabidopsis, most plant genomes like those of rice (Oryza sativa) and maize are exceptionally large; for example, rice has a genome size of 430 Mbp on 12 chromosomes and about 30,000 genes. A loose affiliation of scientists in Japan, China, Korea, and the United States are developing molecular maps and markers for rice and systems for transformation and regeneration. To date there are about 1,500 markers, or about one marker per 200 kb of rice DNA.
Comparative maps have been constructed for rice, wheat, barley, oat, and maize, and identification and cloning of agriculturally important genes is progressing rapidly. An expressed sequence tag (EST) approach is being used as a first step toward genome sequencing in maize. The objective is to construct 200,000 maize ESTs to identify genes already in the databases and any new genes; 45,000 of these ESTs are already in hand. Plant EST projects are also ongoing for Arabidopsis, pine, and Brassica. Little in common has been found among the ESTs for different plants, but only a relatively small number of ESTs have been available for comparison.
Much interest and excitement was generated at the meeting as a result of the complete genomic sequence and initial analyses of the yeast Saccharomyces cerevisiae genome. A consortium of 96 laboratories in the European Union and groups in Canada, Japan, the United Kingdom, and the United States participated in this project. It was initiated in October, 1989 and completed in April, 1996 at a cost of about 40 million dollars. About 50% of the 12 Mbp genome was sequenced by about 12 of the laboratories.
Several physical and chemical features of the yeast genome are noteworthy. The genome is compact and has a gene density of about one orf for every 2 kb of sequence (6,000 genes). Overall the genome is AT-rich with clusters of GC-rich sequences and only a few, relatively short introns. There are large sequence duplications. Interestingly, about half of the human proteins that are known to play a role in disease have homologues in the yeast genome. Also, between one-third and one-half of yeast orfs do not match sequences in the databases and so represent "orphan" gene products that have escaped the attention of geneticists, biochemists, and molecular biologists.
New approaches and strategies as described by Ronald W. Davis and his colleagues at Stanford University will be needed to assess the biological functions of these orphan genes. A high-density microarray system containing every orf is being developed, where each orf is PCR-amplified and arrayed in high-density format on a glass slide using a robotic device. A single slide can contain 8,000 orfs, and quantitative expression can be assessed by hybridizing fluorescently labeled yeast cDNA to the array and measuring the signals with a scanning laser. A PCR targeting strategy has also been developed to generate deletion strains that are uniquely tagged with 20 bp sequences that can be analyzed by the microarray strategy. With these tagged sequences, thousands of deletion strains growing under different physiologic conditions can be pooled and analyzed in parallel.
Over one-third of yeast orfs encode putative membrane proteins containing 1 to 14 transmembrane domains. These include families of ATP-independent transporters for sugars, amino acids, hydrophobic drugs, nitrogen bases, carboxylic acids, purines, potassium, urea, sulfates, peptides, and several unknown transport functions. Some of these proteins are associated with resistance to multiple drugs, some are transport ATPases, and 20 have no homologues in the databases. However, many of the identified transporters in yeast are similar to proteins identified in mammals, including humans, emphasizing the potential value of yeast as a model for more complex eukaryotic organisms.
Efforts are underway to sequence the genome of Schizosaccharomyces pombe, the fission yeast, which is distantly related to Saccharomyces cerevisiae. Investigators at the Sanger Center have completed 1 Mb of sequence on chromosome 1 of this organism. Comparative studies between pombe and cerevisiae should provide useful information about the genes responsible for the different lifestyles of these two eukaryotic organisms.
A two-dimensional polyacrylamide gel electrophoretic proteome analysis of Hemophilus influenzae polypeptides from organisms exposed to different antimicrobial agents or protein synthesis inhibitors allowed detection of about 500 of the 1,700 Hemophilus polypeptides by Coomassie Blue staining or 650 polypeptides by 35S-methionine labeling of cells. Peptide mass fingerprinting by laser desorption mass spectrometry or analysis by amino acid composition also proved to be powerful tools for identifying polypeptides separated by isoelectric point and molecular weight on the acrylamide gels.
Much information was presented at the meeting on the development of systems to increase the efficiency of high -throughput DNA sequencing, including advances in capillary array electrophoresis, the synthesis of oligonucleotide arrays via ink jet technology, and robotic systems for gel loading.
Improvements in sequencing chemistry, in robotic and other labor-saving systems, and in software for base -calling, assembly, and analysis of sequence data were reported. We expect that in the next year there will be a windfall of new sequence information. An almost unimaginable explosion based on the biological, physiological, and evolutionary implications of new information is also forthcoming. The program and abstracts listing speakers and poster presenters for the Eighth International Genome Sequencing and Analysis Conference are published in Microbial and Comparative Genomics, Volume 1, Number 3.
The author may be contacted at The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, E-mail: djdoyle@tigr.org.
Return to the
The ABRF Home
Page