Table of Contents
Conserved DNA sequences allow researchers to gain understanding of the essential genes and proteins that organisms share. Knowing how genes relate to biological systems allow scientists to key in on their specific properties and thus exploit the differences between organisms. This is very important as it offers innovative approaches for combating pathogens. Understanding the evolutionary relationships between two organisms is essential in creating a coordinated attack against infection.
The Central Dogma of Molecular Biology describes how information flows from DNA to RNA and then to protein; the latter being the business end of the organism-level transaction. As DNA sequencing rises in resolution and falls in cost, wider applications for genomics are found. Agriculture, biotechnology, pharmacology and global human standards of living all look to improve as a direct effect of comparative genomics.
Comparative genomics can be loosely defined as the large-scale comparison of genomes in order to understand the biology of individual genomes and to extract general principles that apply to groups of genomes. This definition might seem difficult to comprehend at first. To better understand this definition, one can dissect it. When we compare genomes to each other, we assume that biological sequences, structures, and functions are shared across different organisms.
The genomes of virtually all living organisms are composed of deoxyribonucleic acid (DNA). DNA is found in a double helix configuration constructed from the association of its four building blocks – adenine (A), thymine (T), cytosine (C) and guanine (G). A DNA chain consists of sequences that code for different proteins as well as regulatory sequences that turn the expression of genes off and on.
Once a given genome has been sequenced and uploaded to a database, comparisons of the general features, such as the number of genes and where those genes reside on chromosomes, can be made.
Polymerase Chain Reaction (PCR) is now widely used as a technique to amplify segments of chromosomal DNA or complementary DNA (DNA that is derived from reverse transcription of RNA). In order to perform PCR, the sequences of the target region must be known.
Oligonucleotide primers are designed as short single-stranded DNA fragments that are complementary to the target region sequences. A sample of DNA is denatured through light heat, which effectively separates its two the strands. DNA polymerases can then attach to the 3’ end of the designed primers and synthesize new strands of DNA.
Genome size does not always correlate with advanced evolutionary standing, nor do the number of genes reveal the size of a genome. An example of this is demonstrated when comparing the genomes of two completely sequenced organisms – Arabidopsis thaliana and Drosophila melanogaster. Arabidopsis has 8 million less nucleotide base pairs than Drosophila, but it has 12,000 more genes. In fact, the small Arabidopsis plant has a similar number of genes as we do.
Chromosome-level comparisons gleam insight to the physical co-location between two organisms. This kind of information is provided by higher resolution techniques applied to DNA sequences. The physical co-location or synteny is vital information that researchers use to gain clues about similarities in the groupings of genes. Synteny is generally conserved for genes with related functions, an observation that points to common evolutionary ancestry.
Human and rodent X-chromosomes are a brilliant illustration of synteny; scientists have marked the existence of reciprocal syntenic groups on chromosome 20 and chromosome 2, respectively. The arrangement of the genetic information is conserved along this entire block. Conversely, changes to the human and rodent genomes over millions of years is reflected in the rearrangement of certain genes.
Homologous genetic material, i.e., DNA that is similar between two organisms, allows scientists to compare blocks of genes. Many of the enzymes of intermediary metabolism among related organisms have conserved genes that are similarly arranged on chromosomes. Using the method of analytics, researchers identify similar regions of chromosomes that correspond to regulatory sequences as well as functional loci of genes for the subsequently produced protein’s function.
Benefits of Comparative Genomics
Understanding the similarities and differences between the genes of diverse organisms allows us to study potential cures for diseases that have eluded us for hundreds of years. The knowledge of comparative genomics is at the forefront of rational drug design because many pharmaceuticals act to inhibit enzymes that are produced from genes.
Mycobacterium tuberculosis (MTB) is the causative agent of tuberculosis in humans and their only known reservoir. MTB was the cause of the “White Plague” of the 17th and 18th centuries in Europe, a time period when approximately 100% of the European population was infected with the microorganism, and 25% of all adult deaths were attributed to the disease.
Today, tuberculosis (TB) is the leading bacterial cause of death and affects over one-third of the entire world population.
The primary route of transmission of TB is through aerosols, when individuals carrying the disease cough active droplets containing the infectious bacteria. These active droplets can remain infective for hours.
MTB is a nonmotile rod-shaped bacterium that is an obligate aerobe. Complexes of MTB are always found in the oxygen-rich upper lobes of the lungs, where they act as facultative intracellular parasite infecting macrophages. The aim of researchers is to understand various components of the mycobacterial cell wall, the metabolic pathways that lead to the biosynthesis of these constituents (in this case sterols) and how these pathways can be inhibited.
Isoprenoids are vital to several core bacteria cellular functions. Mycobacteria produce isoprenoids via a metabolic route that differs from the one used by humans – the Methyl Erythritol Phosphate Pathway (MEP). Humans produce cholesterol via the mevalonate pathway (MEV), which synthesises HMG-CoA through Acetyl-CoA.
Regardless of the route that is taken to generate the start and repeat unit, a common characteristic of the isoprenoids is that they are derived from two basic 5-carbon molecules: isopentenyl diphosphate (IPP) and its isomer dimethylallyl diphosphate (DMAPP). In the MEP pathway, these 5-carbon building blocks are generated from pyruvate and glyceraldehyde-3-phosphate.
Subsequent reactions convert MEP to IPP and DMAPP for the generation of the three essential groups of compounds that arise from isoprenoids with respect to mycobacteria – sterols, ubiquinones and dolichols. Sterols provide membrane stability as a primary function. Ubiquinones are electron carriers and the major component of bacterial aerobic respiratory chains.
Understanding the similarities and differences between the genomes of these organisms is at the forefront of the rational drug design for eradicating tuberculosis. Because these organisms possess different pathways for the synthesis of the same vital compound, one pathway can be chemically attacked without blocking any of the enzymes in the pathway of the other. In this case, the enzymes of the mycobacterial MEP pathway can be targeted without interfering with cholesterol biosynthesis in the patient.
Indeed, fosmidomycin is a phosphonic acid compound developed by researchers to inhibit deoxyxylulose phosphate reductoisomerase (DXR)—the enzyme that catalyses the dedicated step of the MEP pathway MTB. As the understanding of the genome and proteome of MTB advances, further-developed chemical compounds will one day likely lead to the eradication of tuberculosis globally.
Transposable elements move from one location to another. They may be duplicated, or excised and inserted elsewhere.
Long interspersed elements (6,000 bp)
- 21 % of human genome
- Transpose themselves
- Retrotransposons: Contain reverse transcriptase
Short interspersed elements (300 bp)
- Nested in LINEs
- Use LINE machinery
- Interrupt genes
Long terminal repeats
- Reverse transcriptase
- No longer have machinery to move
Practice Questions on Comparative Genomics
The correct answers can be found below the references.
- Which of the following statements is true about modern, “high-throughput”, automated DNA sequencing?
- It requires gel electrophoresis with four separate lanes, one for each of the four DNA nucleotides.
- Chain termination occurs using 2’, 3’-dideoxynucleotide triphosphates with a fluorescent tag.
- Chain termination occurs using 3’, 5’-dideoxynucleotide triphosphates.
- Chain termination occurs using 3’-deoxynucleotide triphosphates.
- No DNA primer is required.
- Which of the following is true about PCR (polymerase chain reaction)?
- 20 cycles of PCR will result in amplification of the target sequence by about 20-fold.
- PCR amplification, as described in the text, involves the use of fluorescent-tagged dNTP analogues.
- The introduction of thermo-stable RNA polymerases represents a significant advance in PCR technology.
- The high-temperature portion of a PCR cycle is designed to enhance the template/primer-annealing step.
- PCR requires the presence of dNTPs, DNA polymerase, the target DNA as a template and two primers.
- Which of the following statements is true?
- All protein families accumulate changes, on an evolutionary time scale, at very similar rates.
- The observation that only arginine and lysine residues are found at a specific position in the primary structure of all proteins in a single protein family suggests that a negative charge at this position is essential for this protein to function.
- Sickle cell anemia results from a single point mutation that replaces the valine found in the native protein by a glutamate.
- Typically, genetic open reading frames contain two sections: one that provides the genetic information for the primary structure of the protein coded by this gene, and a second one that encodes detailed folding directions which determine the native tertiary structure.
- The observation that only phenylalanine and tyrosine residues are found at a specific position in the primary structure of all proteins in a single protein family suggests that an aromatic side chain at this position is essential for this protein to function.