Sunday, February 15, 2009


Deoxyribonucleic acid (DNA) is a nucleic acid —usually in the form of a double helix— that contains the genetic instructions specifying the biological development of all cellular forms of life, and most viruses. DNA is a long polymer of nucleotides and encodes the sequence of the amino acid residues in proteins using the genetic code, a triplet code of nucleotides.
In complex eukaryotic cells such as those from plants, animals, fungi and protists, most of the DNA is located in the cell nucleus. By contrast, in simpler cells called prokaryotes, including the eubacteria and archaea, DNA is not separated from the cytoplasm by a nuclear envelope. The cellular organelles known as chloroplasts and mitochondria also carry DNA.
DNA is often referred to as the molecule of heredity as it is responsible for the genetic propagation of most inherited traits. In humans, these traits can range from hair colour to disease susceptibility. During cell division, DNA is replicated and can be transmitted to offspring during reproduction. Lineage studies can be done based on the facts that the mitochondrial DNA only comes from the mother, and the male Y chromosome only comes from the father.
Every person's DNA, their genome, is inherited from both parents. The mother's mitochondrial DNA together with twenty-three chromosomes from each parent combine to form the genome of a zygote, the fertilized egg. As a result, with certain exceptions such as red blood cells, most human cells contain 23 pairs of chromosomes, together with mitochondrial DNA inherited from the mother.


Space-filling model of a section of DNA molecule
DNA Under electron microscope
This section presents an introductory and therefore incomplete overview of DNA.
Genes can be loosely viewed as the organism's "cookbook" or "blueprint";
A strand of DNA contains genes, areas that regulate genes, and areas that either have no function, or a function which we do not (yet) know; also see last bullet point in this section for the difference between DNA and RNA;
DNA is organized as two complementary strands, head-to-toe, with hydrogen bonds between them that can be "unzipped" like a zipper, separating the strands — contrary to a common misconception, DNA is not a single molecule, but rather a pair of molecules joined by these bonds;
DNA is a chain of chemical "building blocks", called " bases", of which there are four types: these can be abbreviated A, T, C, and G. Each base can only "pair up" with one single predetermined other base: A+T, T+A, C+G and G+C are the only possible combinations; that is, an "A" on one strand of double-stranded DNA will "mate" properly only with a "T" on the other, complementary strand;
U replaces T, notably in PBS1 phage DNA; U replaces T in RNA.
The allowable base components of nucleic acids can be polymerized in any order giving the molecules a high degree of uniqueness;
DNA is an acid because of the phosphate groups between each deoxyribose. This the primary reason why DNA has a negative charge.
The "polarity" of each pair is important: A+T is not the same as T+A, just as C+G is not the same as G+C (note that "polarity" as such is never used in this context -- it's just a suggestive way to get the idea across);
For each given base, there is just one possible complementary base, so naming the bases on the conventionally chosen side of the strand is enough to describe the entire double-strand sequence;
The genetic information contained in a strand of DNA is determined by the sequence of bases along its length;
The cell begins DNA replication by forcibly unzipping the DNA double strand down the middle, and then recreates the "other half" of each new single strand by exposing each half to a mixture of the four bases. An enzyme makes a new strand by finding the correct base in the mixture and pairing it with the original strand. In this way, the base on the old strand dictates which base will be on the new strand, and the cell ends up with an extra copy of its DNA.
Mutations are simply chemical imperfections in this process: a base is accidentally skipped, inserted, or incorrectly copied, or the chain is trimmed, or added to; many basic mutations can be described as combinations of these accidental "operations". Mutations can also occur through chemical damage (through mutagens), light (UV damage), or through other more complicated gene swapping events.
DNA molecules that act as enzymes are known in laboratories, but none have been known to be found in life so far.
In addition to the traditionally viewed duplex form of DNA, DNA can also acquire triplex and quadraplex forms. Here instead of the Watson Crick base pairing, Hoogsten base pairing comes into picture.
DNA differs from ribonucleic acid (RNA) by having a sugar 2-deoxyribose instead of ribose in its backbone. This is the basic chemical distinction between RNA and DNA.

DNA pairing

DNA base pairing
The paired bases are joined by hydrogen bonds. This image shows the normal base pairing. And also how on rare occasions, wrong pairing can happen, when thymine goes into its enol form or cytosine goes into its imino form.

DNA in practice

DNA in crime
Forensic scientists can use DNA located in blood, semen, skin, saliva or hair left at the scene of a crime to identify a possible suspect, a process called genetic fingerprinting or DNA profiling. In DNA profiling the relative lengths of sections of repetitive DNA, such as short tandem repeats and minisatellites, are compared. DNA profiling was developed in 1984 by English geneticist Alec Jeffreys, and was first used to convict Colin Pitchfork in 1988 in the Enderby murders case in Leicestershire, England. Many jurisdictions require convicts of certain types of crimes to provide a sample of DNA for inclusion in a computerized database. This has helped investigators solve old cases where the perpetrator was unknown and only a DNA sample was obtained from the scene (particularly in rape cases between strangers). This method is one of the most reliable techniques for identifying a criminal, but is not always perfect, for example if no DNA can be retrieved, or if the scene is contaminated with the DNA of several possible suspects.

DNA in computation
An extremely important note: Despite its biological origins, DNA plays an important role in computer science, both as a motivating research problem and as a method of computation in itself, called DNA computing, not only for biological origins.
As a simple example, research on string searching algorithms, which find an occurrence of a sequence of letters inside a larger sequence of letters, was motivated by DNA research, where it is used to find specific sequences of nucleotides in a large sequence. In other applications like text editors, even simple algorithms for this problem usually suffice, but DNA sequences cause these algorithms to exhibit near-worst-case behaviour due to their small number of distinct characters.
Databases have also been strongly motivated by DNA research, which requires special tools for storing and manipulating DNA sequences. Databases specialized for this purpose are called genomic databases, and have a number of unique technical challenges associated with the operations of approximate matching, sequence comparison, finding repeating patterns, and homology searching.
In 1994, Leonard Adleman of the University of Southern California made headlines when he discovered a way of solving the directed Hamiltonian path problem, an NP-complete problem, using tools from molecular biology, in particular DNA. The new approach, dubbed DNA computing, has practical advantages over traditional computers in power use, space use, and efficiency, due to its ability to highly parallelize the computation (see parallel computing), although there is labor worth mentioning involved in retrieving the answers. A number of other problems, including simulation of various abstract machines, the boolean satisfiability problem, and the bounded version of the Post correspondence problem, have since been analyzed using DNA computing.
Due to its compactness, DNA also has a theoretical role in cryptography, where in particular it allows unbreakable one-time pads to be efficiently constructed and used .

DNA in historical and anthropological study
DNA research is also used to follow the course of human populations over time.
DNA evidence is also being used to try to identify the Ten Lost Tribes of Israel [2] [3]
DNA has also been used to look at fairly recent issues of family relationships, such as establishing some manner of familial relationship between the descendents of Sally Hemings and the family of Thomas Jefferson.

Molecular structure
Comparisons between DNA and single stranded RNA with the diagram of the bases showing.
Although sometimes called "the molecule of heredity", DNA macromolecules as people typically think of them are not single molecules. Rather, they are pairs of molecules, which entwine like vines to form a double helix (see the illustration at the right).
Each vine-like molecule is a strand of DNA: a chemically linked chain of nucleotides, each of which consists of a sugar ( deoxyribose), a phosphate and one of five kinds of nucleobases ("bases"). Because DNA strands are composed of these nucleotide subunits, they are polymers.
The diversity of the bases means that there are five kinds of nucleotides, which are commonly referred to by the identity of their bases. These are adenine (A), thymine (T), uracil (U), cytosine (C), and guanine (G). U is rarely found in DNA except as a result of chemical degradation of C, but in some viruses, notably PBS1 phage DNA, U completely replaces the usual T in its DNA. Similarly, RNA usually contains U in place of T, but in certain RNAs such as transfer RNA, T is always found in some positions. Thus, the only true difference between DNA and RNA is the sugar, 2-deoxyribose in DNA and ribose in RNA.
In a DNA double helix, two polynucleotide strands can associate through the hydrophobic effect and pi stacking. Specificity of which strands stay associated is determined by complementary pairing. Each base forms hydrogen bonds readily to only one other -- A to T and C to G -- so that the identity of the base on one strand dictates the strength of the association; the more complementary bases exist, the stronger and longer-lasting the association.
The cell's machinery is capable of melting or disassociating a DNA double helix, and using each DNA strand as a template for synthesizing a new strand which is nearly identical to the previous strand. Errors that occur in the synthesis are known as mutations. The process known as PCR (polymerase chain reaction) mimics this process in vitro in a nonliving system.
Because pairing causes the nucleotide bases to face the helical axis, the sugar and phosphate groups of the nucleotides run along the outside; the two chains they form are sometimes called the "backbones" of the helix. In fact, it is chemical bonds between the phosphates and the sugars that link one nucleotide to the next in the DNA strand.
Rotating DNA stick model ( info)
Animation of a section of DNA rotating. (1.00 MB, animated GIF format).
Problems seeing the videos? Media help.

Sequence role
Within a gene, the sequence of nucleotides along a DNA strand defines a messenger RNA sequence which then defines a protein, that an organism is liable to manufacture or " express" at one or several points in its life using the information of the sequence. The relationship between the nucleotide sequence and the amino-acid sequence of the protein is determined by simple cellular rules of translation, known collectively as the genetic code. The genetic code is made up of three-letter 'words' (termed a codon) formed from a sequence of three nucleotides (e.g. ACT, CAG, TTT). These codons can then be translated with messenger RNA and then transfer RNA, with a codon corresponding to a particular amino acid. There are 64 possible codons (4 bases in 3 places 43) that encode 20 amino acids. Most amino acids, therefore, have more than one possible codon. There are also three 'stop' or 'nonsense' codons signifying the end of the coding region, namely the UAA, UGA and UAG codons.
In many species, only a small fraction of the total sequence of the genome appears to encode protein. For example, only about 1.5% of the human genome consists of protein-coding exons. The function of the rest is a matter of speculation. It is known that certain nucleotide sequences specify affinity for DNA binding proteins, which play a wide variety of vital roles, in particular through control of replication and transcription. These sequences are frequently called regulatory sequences, and researchers assume that so far they have identified only a tiny fraction of the total that exist. " Junk DNA" represents sequences that do not yet appear to contain genes or to have a function. The reasons for the presence of so much non-coding DNA in eukaryotic genomes and the extraordinary differences in genome size (" C-value") among species represent a long-standing puzzle in DNA research known as the " C-value enigma".
Some DNA sequences play structural roles in chromosomes. Telomers and centromeres typically contain few (if any) protein-coding genes, but are important for the function and stability of chromosomes. Some genes code for "RNA genes" (see tRNA and rRNA). Some RNA genes code for transcripts that function as regulatory RNAs (see siRNA) that influence the function of other RNA molecules. The intron-exon structure of some genes (such as immunoglobin and protocadeherin genes) is important for allowing alternative splicing of pre-mRNA which allows several different proteins to be made from the same gene. Some non-coding DNA represents pseudogenes that can be used as raw material for the creation of new genes with new functions. Some non-coding DNA provided hot-spots for duplication of short DNA regions; such sequence duplication has been the major form of genetic change in the human lineage (see evidence from the Chimpanzee Genome Project). Exons interspersed with introns allows for "exon shuffling" and the creation of modified genes that might have new adaptive functions. Large amounts of non-coding DNA is probably adaptive in that it provides chromosomal regions where recombination between homologous portions of chromosomes can take place without disrupting the function of genes. Some biologists such as Stuart Kauffman have speculated that non-coding DNA may modify the rate of evolution of a species.[ citation needed]
Sequence also determines a DNA segment's susceptibility to cleavage by restriction enzymes, the quintessential tools of genetic engineering. The position of cleavage sites throughout an individual's genome determines one kind of an individual's " DNA fingerprint".