
DNA Testing: An Introduction For Non-Scientists
An
Illustrated Explanation
by DONALD E. RILEY, Ph.D.
University of Washington
(Copyright 1998)
Table of Contents
RFLP Analysis Explained in Easy Terms
The explanation of DNA testing that follows is intended as an introduction to the subject for those who may have limited backgrounds in biological science. While basically accurate, this explanation involves liberal use of illustration and, in some cases, over-simplification. Although intended to be informative, this is a brief and incomplete explanation of a complex subject. The author suggests consulting the scientific literature for more rigorous details and alternative views.
DNA is material that governs inheritance of eye color, hair color, stature, bone density and many other human and animal traits. DNA is a long but narrow string-like object. A one foot long string or strand of DNA is normally packed into a space roughly equal to a cube 1/millionth of an inch on a side. This is possible only because DNA is a very thin string.
Our body's cells each contain a complete sample of our DNA. One cell is roughly equal in size to the cube described in the previous paragraph. There are muscle cells, brain cells, liver cells, blood cells, sperm cells and others. Basically, every part of the body is made up of these tiny cells and each contains a sample or complement of DNA identical to that of every other cell within a given person. There are a few exceptions. For example, our red blood cells lack DNA. Blood itself can be typed because of the DNA contained in our white blood cells.
Not only does the human body rely on DNA but so do most living things including plants, animals and bacteria.
A strand of DNA is made up of tiny building blocks. There are only four, different basic building blocks. Scientists usually refer to these using four letters for the four different building blocks. The letters are: A, T, G, and C. These four letters are short nicknames for more complicated chemical names, but actually the letters (A,T, G and C) are used much more commonly than the chemical names so the latter will not be mentioned here. Another way of referring to the building blocks or letters is to call them bases.
For example, to refer to a particular piece of DNA, we might write: AATTGCCTTTTAAAAA. This is a perfectly acceptable way of describing a piece of DNA. Someone with a machine called a DNA synthesizer could actually synthesize the same piece of DNA from the information AATTGCCTTTTAAAAA alone.
The sequence of bases (letters) can code for many properties of the body's cells. The cells can read this code. Some DNA sequences encode important information for the cell. Such DNA is called, not surprisingly, "coding DNA." Our cells also contain much DNA that doesn't encode anything that we know about. If the DNA doesn't encode anything, it is called non-coding DNA or sometimes, "junk DNA."
The DNA code, or genetic code as it is called, is passed through the sperm and egg to the offspring. A single sperm cell contains about three billion bases consisting of A, T, G and C that follow each other in a well-defined sequence along the strand of DNA.
Both coding and non-coding DNAs may vary from one individual to another. These DNA variations can be used to identify people or at least distinguish one person from another.
What is a Locus?
A locus (with a hard "c", LOW-KUS) is simply a location in the DNA. The plural of locus is, loci (with a soft "c", pronounced LOW-SI). Again, the DNA is a long string like object. Until it is extracted from the cells and purified, the long string of human DNA is tightly folded into bundles called chromosomes.
The
illustration shows roughly how a pair of chromosomes might look in a microscope.
We have 23 pairs of chromosomes. The illustration shows chromosome 4, which
contains a locus called "GYPA." This locus is useful for forensic DNA testing
because it is polymorphic, which means that it may take different forms (i.e.,
have different sequences of bases) in different chromosomes. Each of the
possible forms is called an allele. Locus GYPA has two alleles, called A and B.
In the illustration, one of the two chromosomes has allele A and the other
allele B. A person with these chromosomes would be said to have genotype AB for
the GYPA locus.
Currently, there are two main types of forensic DNA testing. They are RFLP analysis and PCR-based analysis.
Generally, RFLP analysis requires larger amounts of DNA and the DNA must be undegraded. Crime-scene evidence that is old or that is present in small amounts is often unsuitable for RFLP testing. Warm moist conditions may accelerate DNA degradation rendering it unsuitable for RFLP in a relatively short period of time.
PCR testing often requires less DNA than RFLP testing and the DNA may be partially degraded, more so than is the case with RFLP. However, PCR still has sample size and degradation limitations. PCR tests are extremely sensitive to contaminating DNA at the crime scene and within the test laboratory. During PCR, contaminants may be amplified up to a billion times their original concentration. Contamination can influence PCR results, particularly in the absence of proper handling techniques and proper controls for contamination.
RFLP ANALYSIS EXPLAINED IN EASY TERMS
RFLP DNA testing has four basic steps:
Using the same probe and enzyme, the test lab will perform these same steps for many people. The length of the target DNA fragments for each person will be recorded in a database. The distribution of fragment sizes in the database provides a rough idea of how rare or common a fragment of any given length is in a particular population. The commonness of a given size of DNA fragment is called a population frequency.
This
figure illustrates the RFLP process. First, a restriction enzyme cuts the DNA
strand into thousands of fragments of nearly all possible sizes. The fragments
are then separated by size on a gel through the process known as
electrophoresis. The DNA at this point is invisible in the gel unless it is
stained with a dye. A replica of the gel's DNA is made on something called a
blot (also called a Southern blot) or membrane. The blot is then probed (mixed
with) a special preparation of DNA that recognizes a specific DNA sequence or
locus. Often, the probe is a radioactively labeled DNA sequence (represented by
* labeled object in the figure above). Excess probe is washed off the blot, then
the blot is laid onto X-ray film. The radiation from the probes exposes the
film, producing dark bands. The bands indicate the sizes of the fragments
(alleles) for the locus within each sample. The film is now called an "autorad."
The band sizes are measured by comparing them with a "ladder" produced by DNA
fragments of known sizes. A match may be declared if two samples have RFLP band
sizes that are all within 5% of one another in size.
For RFLP analysis to be reliable, all complex steps of the analysis must be carefully controlled. Databases must be large, meaning they include many people; they must be representative of the relevant population group. Because of the complexities of populations, databases must be interpreted with extreme care. For example, DNA fragment sizes rare in one population may be more common in other populations. Further, sub-populations or populations within populations must be considered.
PCR is an acronym for "polymerase chain reaction." This term applies to a wide variety of different DNA tests that differ in reliability and effectiveness. The reliability of each kind of PCR test needs independent verification. PCR itself doesn't accomplish DNA typing, it only increases the amount of DNA available for typing.
Within the DNA string, some areas are the same for all humans. These areas are said to be "conserved." Other areas tend to vary across people. These areas are said to be polymorphic or variable. The variable regions (V) are usually interspersed among the conserved or constant regions (C) as shown in the figure below.
In forensic
DNA tests, PCR is used to identify and copy one or more variable regions of DNA.
The tests use "primers" (indicated by small arrows below) that identify constant
regions adjacent to the variable region of interest. The primers are short
pieces of DNA (similar to the probes used for RFLP analysis) that are
complementary to their target sequences. The primers serve as the starting
points for copying of variable regions of the DNA sequence. The actual copying
occurs in a test tube which is placed in a device called a thermal cycler. The
thermal cycler goes through a series of heating and cooling cycles. In each
cycle, each DNA fragments from the target region is duplicated. As the cycles
continue, the quantity of target DNA increases exponentially.
PCR
Contamination
PCR copies DNA efficiently if the initial DNA is in good condition. A single DNA entity (molecule) can become millions or billions of DNA molecules in about three hours. In this way, PCR is similar to what happens when a clinical infection occurs. Clinicians have known for many years that a single germ (bacterial cell or virus) contaminating a wound can produce a massive infection if untreated. Similarly, a DNA molecule can contaminate (infect) a PCR and become a significant problem. The ability of small amounts of DNA to produce false and misleading results is well known within the research community.
Prevention of false results involves the use of carefully applied controls and techniques. As described later, such controls and techniques can sometimes detect contamination but cannot guarantee that contamination hasn't influenced the results.
The DQA1 Test (also known as DQ alpha)
After the DNA from a sample is amplified through the use of PCR, it can by typed. The most widely used typing procedure involves a commercial kit called the PM plus DQA1 typing kit. This kit examines six genetic loci. All six are copied in the initial PCR. The products from this reaction are then placed onto two separate typing strips. One strip is for DQ alpha and the other types the remaining five loci.
There are several steps in a DQ alpha PCR test:
3. The
amplified DNA is next treated with a variety of probes that are bound to a blot
(see RFLP: Note: In RFLP, the target DNA is bound to the blot and the probe DNA
is added. For the DQ alpha dot blot, the probe DNAs are bound to a small blot
strip and the target DNA is added). Each probe is found in a specific "dot" on
the blot strip (which is known as a "dot blot"). A chemical reaction causes the
dot to darken and become noticeable when DNA containing a particular allele
(variant) is present.
The pattern of dots that appears on the dot blot indicates which probes the amplified DNA bound to, and thus allows the DNA type of the sample, also called a genotype, to be inferred.
DQ alpha typing strips look like this before any types are obtained:
The
invisible dot to the right of the number 1, has a DNA probe for the 1-allele
(variation) for DQ alpha. The invisible dot to the right of the 2 has a DNA
probe for the 2-allele and so on. The 1-allele itself has variations, the
1.1,1.2 and 1.3 subtypes, also called alleles. Notice that the typing strip has
no specific dot or probe for the 1.2 subtype. Also, the typing strip can't
distinguish between the 4.2 and 4.3 subtypes and there is a single dot for
these. It is quite possible that there exist DQ alpha alleles that would be
undetected by the typing strip and alleles that may be further subtypes of the
alleles that the strip does detect.
Here are some examples of how the strips are read:
Example 1:

Example 2:

Example 3:

Example 4:

This last example (Example 4) brings up an important issue with DQ alpha typing. The 1.2 allele is actually the second most common allele in most populations. This means there will be frequent situations in which the 1.2 allele may be present but undetected, as in the last example. An obvious question is: Why not just have a specific probe for the 1.2 allele? The answer is that the typing strip already maximizes the probing of a relatively short stretch of DNA. That is, the DQ alpha locus itself is only about 240 base pairs long. The multiple probe typing strip was probably about the best that could be done in terms of detecting multiple alleles of this small locus in a single typing step.
DQ alpha is often the first PCR test that a forensic lab adopts. The term "DQ alpha" is even sometimes erroneously used interchangeably with the term "PCR." Actually, the DQ alpha system is quite different from the majority of PCR applications in the scientific community. This will be explained in more detail below.
Polymarker (PM)
The PM portion of the PM plus DQA1 kit involves 5 genetic loci in addition to DQ alpha. These additional loci are named for historical reasons. The 5 loci are LDLR, GYPA, HBGG, D7S8 and GC. Each of these represents a distinct location or locus in the DNA. These loci have rather simple allelic variations compared to DQ alpha. For example, there are only two LDLR alleles detected by the system, allele A and allele B. The same is true for GYPA and D7S8. The loci HBGG and GC each three alleles labeled A, B and C. Thus, reading PM typing strips is fairly simple at least on the surface. Here are some examples:

The manufacturer recommends a lower limit of input DNA for PM plus DQA1 typing. The reason for this lower limit (2 nanograms, ng., of DNA) is the possibility of missing alleles if the input DNA is too low. If there is too little DNA, the system may detect some alleles and not others. This process is sometimes called "allelic dropout" or differential amplification. It can be a problem with this technique, particularly with low amounts of DNA or degraded DNA. The original User's Guide for the DQ alpha test kit discussed this problem although it did not precisely define the conditions under which alleles may "drop out."
The DQA1 and PM test strips contain control dots (the "C dot" on the DQA1 strip and the "S dot" on the PM strip) that are designed to indicate whether sufficient DNA is present to "light up" dots on the test strip. The potential for missing an allele increases if the control dots cannot be seen or are faint because there may be insufficient DNA present to pass the threshold of detection for all alleles. The following example illustrates how a DNA profile of one person might change to that of another due to failure to detect an allele.
Failure
to detect alleles under certain circumstances is a theoretical probability and
was demonstrated for DQ alpha in the original User's Guide. In theory, the loss
of alleles is due to what is called the "stochastic effect." In addition to the
stochastic effect, a PCR phenomenon called "differential amplification" may play
a role when input DNA amounts are low, when input DNA is extensively degraded
and possibly at other times.
PM plus DQA1 is frequently used on mixed DNA samples from two or more people. The following example illustrates some of the ambiguity that can arise if interpretations are not cautious:
Obviously,
since two of the loci (HBGG and GC) show three alleles, the sample was a mixture
of at least two people. The problem here is that any two people can be included
as contributing to the mixture. The typing strip is saturated, meaning every dot
that can be lit up is lit up. A poorly recognized limitation of the PM strip is
that it is very easily saturated. For example, two people of types
AB/AA/AB/BB/BC (person 1) and AB/BB/AC/AA/AA (person 2) could, when their DNAs
are mixed produce the pattern in the example. In fact, there are almost
limitless combinations of 2 types that could produce the pattern. There are also
many combinations of two people that would lead to a typing strip lacking one or
two dots. Finally, there are many mixtures that may mimic a single source of
DNA. For example:
The
profile in this example could have come from a single person whose profile was,
AB/AB/AC/AB/AB. Alternatively, two people of types AA/AB/AC/AA/BB and
BB/AB/AA/BB/AA if mixed, could produce the profile. There are many other
possible combinations of people who, when their DNAs are mixed, could produce
the profile. In fact, the only individuals excluded are those possessing the
HBGG, B allele and the GC, C allele assuming that the typing strip is
reliably detecting all the alleles present. Extreme caution should be used when
there is a possibility of a DNA mixture. It is arguable whether the system
should be used with mixtures at all.
CLOSE