Organization of the Human Genome
المؤلف:
Cohn, R. D., Scherer, S. W., & Hamosh, A.
المصدر:
Thompson & Thompson Genetics and Genomics in Medicine
الجزء والصفحة:
9th E, P10-13
2025-10-29
64
Chromosomes are not just a random collection of different types of genes and other DNA sequences. Regions of the genome with similar characteristics tend to be clustered together, and the functional organization of the genome reflects its structural organization and sequence. Some chromosome regions, or even whole chromosomes, are high in gene content (“gene rich”), whereas others are low (“gene poor”) (Fig. 1). The clinical consequences of abnormalities of genome structure reflect the specific nature of the genes and sequences involved. Thus abnormalities of gene-rich chromosomes or chromosomal regions tend to be much more severe clinically than simi lar-sized defects involving gene-poor parts of the genome.

Fig1. Size and gene content of the 24 human chromosomes. Dotted diagonal line corresponds to the average density of genes in the genome, approximately 6.7 protein-coding genes per megabase (Mb). Chromosomes that are relatively gene rich are above the diagonal and trend to the upper left. Chromosomes that are relatively gene poor are below the diagonal and trend to the lower right. (Based on data from European Bioinformatics Institute and Wellcome Trust Sanger Institute: Ensembl release 70, January 2013. Available from http://www.ensembl.org, v37.)
As a result of knowledge gained from the Human Genome Project, it is apparent that the organization of DNA in the human genome is both more varied and more complex than was once appreciated. Of the billions of base pairs of DNA in any genome, fewer than 1.5% actually encodes proteins. This portion of the genome is referred to as the exome. Regulatory elements that influence or determine patterns of gene expression during development or in tissues were believed to account for only approximately 5% of additional sequence, although more recent analyses of chromatin characteristics suggest that a much higher proportion of the genome may provide signals that are relevant to genome functions. Only approximately half of the total linear length of the genome consists of so-called single copy or unique DNA, that is, DNA whose linear order of specific nucleotides is represented only once (or at most a few times) around the entire genome. This concept may appear surprising to some, given that there are only four different nucleotides in DNA. But consider even a tiny stretch of the genome that is only 10 bases long; with four types of bases, there are over 1 million possible sequences. Although the order of bases in the genome is not entirely random, any particular 16-base sequence would be predicted by chance alone to appear only once in any given genome.
The rest of the genome consists of several classes of repetitive DNA and includes DNA whose nucleotide sequence is repeated, either identically or with some variation, hundreds to millions of times in the genome. Whereas most (but not all) of the estimated 20,000 protein-coding genes in the genome (see box earlier in this chapter) are represented in single-copy DNA, sequences in the repetitive DNA fraction contribute to maintaining chromosome structure and are an important source of variation between different individuals; some of this variation can predispose to pathologic events in the genome, as we will see in Chapters 5 and 6.
Single-Copy DNA Sequences
Although single-copy DNA makes up at least half of the DNA in the genome, much of its function remains a mystery because, as mentioned, sequences actually encoding proteins (i.e., the coding portion of genes) constitute only a small proportion of all the single-copy DNA. Most single-copy DNA is found in short stretches (several kilo base pairs or less), interspersed with members of various repetitive DNA families. The organization of genes in single-copy DNA is addressed in depth in Chapter 3.
Repetitive DNA Sequences
Several different categories of repetitive DNA are recognized. A useful distinguishing feature is whether the repeated sequences (“repeats”) are clustered in one or a few locations or whether they are interspersed with single-copy sequences along the chromosome. Clustered repeated sequences constitute an estimated 10% to 15% of the genome and consist of arrays of various short repeats organized in tandem in a head-to-tail fashion. The different types of such tandem repeats are collectively called satellite DNAs, so named because many of the original tandem repeat families could be separated by biochemical methods from the bulk of the genome as distinct (“satellite”) fractions of DNA.
Tandem repeat families vary with regard to their location in the genome and the nature of sequences that make up the array. In general, such arrays can stretch several million base pairs or more in length and constitute up to several percent of the DNA content of an individual human chromosome. Some tandem repeat sequences are important as tools that are useful in clinical cytogenetic analysis. Long arrays of repeats based on repetitions (with some variation) of a short sequence such as a pentanucleotide are found in large genetically inert regions on chromosomes 1, 9, and 16 and make up more than half of the Y chromosome. Other tandem repeat families are based on somewhat longer basic repeats. For example, the α-satellite family of DNA is composed of tandem arrays of an approximately 171-bp unit, found at the centromere of each human chromosome, which is critical for attachment of chromosomes to microtubules of the spindle apparatus during cell division.
In addition to tandem repeat DNAs, another major class of repetitive DNA in the genome consists of related sequences that are dispersed throughout the genome rather than clustered in one or a few locations. Although many DNA families meet this general description, two in particular warrant discussion because together they make up a significant proportion of the genome and because they have been implicated in genetic conditions. Among the best-studied dispersed repetitive elements are those belonging to the so-called Alu family. The members of this family are approximately 300 bp in length and are related to each other although not identical in DNA sequence. In total, there are more than 1 million Alu family members in the genome, making up at least 10% of human DNA. A second major dispersed repetitive DNA family is called the long interspersed nuclear element (LINE, sometimes called L1) family. LINEs are up to 6 kb in length and are found in approximately 850,000 copies per genome, accounting for nearly 20% of the genome. Both of these families are plentiful in some regions of the genome but relatively sparse in others—regions rich in GC content tend to be enriched in Alu elements but depleted of LINE sequences, whereas the opposite is true of more AT-rich regions of the genome.
Repetitive DNA and Disease. Both Alu and LINE sequences have been implicated as the cause of mutations in hereditary disease. At least a few copies of the LINE and Alu families generate copies of themselves that can integrate elsewhere in the genome, occasionally causing insertional inactivation of a medically important gene. The frequency of such events causing genetic disease in humans is unknown, but they may account for as many as 1 in 500 mutations. In addition, aberrant recombination events between different LINE repeats or Alu repeats can also be a cause of variants in some genetic diseases.
An important additional type of repetitive DNA found in many different locations around the genome includes sequences that are duplicated, often with extraordinarily high sequence conservation. Duplications involving substantial segments of a chromosome, called segmental duplications, can span hundreds of kilobase pairs and account for at least 5% of the genome. When the duplicated regions contain genes, genomic rearrangements involving the duplicated sequences can result in the deletion of the region (and the genes) between the copies and thus give rise to disease (see Chapters 5 and 6).
الاكثر قراءة في مواضيع عامة في الاحياء الجزيئي
اخر الاخبار
اخبار العتبة العباسية المقدسة