![]()
THE BOOK OF LIFE:
READING THE SEQUENCE OF HUMAN DNA
September 1998
http://www.nhgri.nih.gov/NEWS/Finish_sequencing_early/cracking_the_code.htmlFinding a single gene in the huge morass of DNA that makes up the human genome--some 3 billion base pairs of it--requires a set of powerful tools. The Human Genome Project has developed three types of tools to make gene hunts and other genetic analyses faster, cheaper, and practical for almost any scientist to do. These tools include genetic maps (also called linkage maps), physical maps, and DNA sequence--a detailed description of the order of the nucleotide bases in DNA. Indeed, a major goal of the Human Genome Project is to sequence the entire length of human DNA.
The sequence of bases--the chemical building blocks that make up the DNA strand—contains the instructions for everything a cell does, from conception until death. If the letters representing the 3 billion bases that make up the human genome were printed out in books, and the books were stacked one on top of the other, they would reach as high as the Washington Monument. The ultimate goal of the Human Genome Project is to read the order, letter by letter, of those 3 billion bases. Changes in the spelling of the DNA letters can increase your chances of developing an illness, protect you from getting sick, or predict the way your body will handle medicines. Once scientists can read the DNA instruction book, they will be able to understand and treat diseases better.
The federally funded Human Genome Project began sequencing the DNA of laboratory organisms in 1990, while fine-tuning the strategy that would eventually be used to sequence the larger, more complex human genome. The complete DNA sequence of the genomes of many organisms have been completed, including that of the bacteria E. coli and baker’s yeast. The end of 1998 will mark the completion of the first genome sequence from a multi-celled animal, the roundworm Caenorhabditis elegans.
Scientists will compare the human sequence to that of smaller organisms to help them understand how genes function and to piece together a comprehensive look at how genes work as an integrated system in a cell. Humans and yeast, for example, share a number of similarities in their genetic make up. For one, many regions of yeast DNA contain stretches that very closely resemble stretches of human DNA. These similarities tell scientists the genes in those regions play a critical role in cell function in both species, or they would have been lost during the 1 billion years of evolution that separate yeast and humans. Some of these critical processes include DNA copying and repair of damaged DNA, protein synthesis and transport across membranes, and control of metabolic processes. In cancer research, yeast has emerged as an important model for studying control of cell division.
The first methods for sequencing DNA were developed in the mid-1970s. At that time, scientists used a series of chemical reactions to sequence only a few base pairs per year, not enough to take on a single gene of several thousand bases, much less the entire human genome. When the Human Genome Project began in 1990, few laboratories had sequenced even 100,000 bases, and the cost of doing so was more than $10 per base pair.
Since then, technology improvements and automation have increased speed and dramatically lowered cost to the point where individual genes are sequenced routinely. Now, machines read the sequence quickly, but they still can only read short DNA fragments at a time. So, using a strategy referred to as "shotgun" sequencing, the text of each page of those books stacked as tall as the Washington Monument is randomly cut into small fragments. These fragments are small enough for sequencing machines to read. But to get long stretches of DNA, you must then re-assemble these sequenced fragments back into sentences, paragraphs, chapters, and books. For the most part, sophisticated computer programs perform the re-assembly of the millions of pieces of this giant puzzle.
Sequencing of the human genome began in earnest in 1996, two years earlier than expected. In the beginning, this new initiative tested strategies for full-scale sequencing of human DNA. Scientists have been guided by information from these pilot studies as they now ramp up to full-scale sequencing of human DNA. Encouraged by this experience, they now are ready to move up the timetable for completion of the human sequence by a full two years. They intend to produce the first fully completed, highly accurate reference sequence of the human genome by the end of 2003, the year that marks the 50th anniversary of the discovery of the structure of DNA by James Watson and Francis Crick. Researchers also expect to pass another important milestone by 2001, when they will have a useful "working draft" of the sequence.
In the United States, the National Institutes of Health and the Department of Energy will sequence 60-70% of the human genome. The rest of the human genome will be sequenced by the Sanger Centre in England, funded by the Wellcome-Trust, and through other sequencing centers around the world. The public genome project employs shotgun sequencing of DNA fragments that have been carefully mapped and stored. This process makes re-assembling the sequenced fragments to reflect their original position in the genome easier and more accurate, because the general location of the fragment in the genome is known. To return to the book analogy, it is much easier to re-assemble the text if all the fragments are known to come from a single page. Scientists periodically encounter DNA fragments that are particularly difficult to sequence. Because all the fragments have been mapped and stored, a scientist can return to the difficult spots after most of the genome has been sequenced and assembled to work on closing the gaps and "finishing."
The international sequencing community, whose goal is to complete the human DNA sequence by the end of 2003, has agreed to a policy of releasing data every 24 hours into a free, publicly accessible database. More than 10 percent of the human sequence now is available in a public database and more than half of that is already "finished." DNA sequence derived by Human Genome Project laboratories is stored in databases scientists can freely access. In the United States, GenBank (http://www.ncbi.nlm.nih.gov), run by the National Center for Biotechnology Information, serves as the public repository of sequence information.
Two efforts recently have been announced in the private sector to carry out human genome sequencing. These projects also employ a type of shotgun sequencing, but differ from the public effort in several significant ways. First, the strategy, called "whole-genome shotgun sequencing", employs fragments that have not been previously mapped. Because the scientist does not know where in the morass of 3 billion base pairs a particular fragment might belong, the task of re-assembling the fragments becomes far more difficult and prone to error. This difficulty in re-assembly will inevitably lead to gaps in the sequence, some of which may occur over DNA regions with great biological significance. Second, when a scientist encounters a fragment that is particularly difficult to sequence, he or she cannot return to the fragment later because it has not been mapped and stored for future study. A product containing gaps left by these unsequenced fragments is incomplete for many disease research purposes. Finally, release of sequence data from private sector efforts will occur less frequently or not at all. These ventures also maintain the right to patent the most biologically important gene data, thereby restricting the free access of other researchers to this vital information. The stated intention of one of these ventures is to release data on a quarterly basis, creating the possibility of synergy with the public effort. The NIH and DOE welcome such initiatives and look forward to cooperating with all parties that can contribute to more rapid public availability of the human genome sequence.
The ultimate Human Genome Project task of sequencing all 3 billion base pairs in the human genome will provide scientists with a virtual blueprint of a human being. With the sequence in hand researchers can begin to "read" the information in the genes and understand how genes function. From there, researchers can start to unravel biology’s most complicated processes: how a baby develops from a single cell; how genes coordinate the functions of tissues and organs; how disease predisposition occurs; and perhaps, even, how the human brain works.
NHGRI Office of Communications, 31 Center Drive, building 31, Room 4B09, MSC 2152, Tel: 301.402.0911, FAX: 301.402.2218 http://www.nhgri.nih.gov
Data Release and Access Principles and Policy
The human genome, the common heritage of all humanity, is arguably the most valuable dataset the biomedical research community has ever known. It holds long-sought secrets of human development, physiology, and medicine.
The highest priority of the International Human Genome Sequencing Consortium is ensuring that sequencing data from the human genome is available to the world's scientists rapidly, freely and without restriction.
Since the sequencing phase of the Human Genome Project (HGP) began five years ago, all of the data generated by participants has been deposited in publicly available databases every 24 hours.
http://www.ncbi.nlm.nih.gov/Genbank/
http://www.ebi.ac.uk/embl/index.html
http://www.ddbj.nig.ac.jp/
Translating the text of the human genome into practical applications that will alleviate suffering is one of the greatest challenges facing humankind. This mission will require the work of tens of thousands of scientists throughout the world. No scientist wanting to advance this cause should be denied the opportunity to do so for lack of access to raw genomic data. Delaying the release of either unfinished or finished genomic DNA sequence data serves no scientific or societal purpose.