Haplotype DNA evidence
A Y-chromosome or a mitochondria has genetic information that is useful for identification or
for kinship problems such as paternity attribution, but has to be treated somewhat differently
from the more typical nuclear (=autosomal) DNA profile for many reasons:
- The genetic rules are simpler the trait is either known to be passed or known
not to be passed (depending on the sexes involved) to each offspring; there are no
choices or 50% probabilities of transmission as with nuclear DNA.
May2018, note added 5-10 years after initially writing this web
page: This page discusses only how to handle the complication of
relatives and possible mutation. It doesn't deal with the difficult
questions of evaluating what I loosely call "Pr(haplotype)" below, or
we'd have many further reasons to list:
- Several markers are linked, i.e. physically chained and inherited together, so they
must be considered as a unit. No recombination. The product rule doesn't apply at all.
- Autosomal STR alleles matching probabilities or frequencies can reasonably be
estimated as sample frequencies; for Y haplotypes that is far from true.
- When using population data to estimate the significance of an autosomal STR
allele match, it's close to adequate to consider only the allele in question.
For Y by contrast, it's important to look at the entire database.
- Putting confidence intervals on a matching probability is mathematically
ignorant. However, while it's a fairly harmless error in the autosomal domain,
it's a crippling mistake when dealing with Y haplotype matching
- Obviously all men are related. Forgetting that when dealing with Y misses the
fundamental point that nearly all Y matching is from identity by
descent. That's not so with a single STR locus as in the autosomal
- "Theta" — the chance of two alleles or haplotypes being IBD —
in autosomal practice is a minor adjustment to the matching chance.
With Y haplotypes it's the main thing; it's a good approximation to the matching
chance and anything else is a minor adjustment.
- Explicit models — a careful mathematical approach laying out premises
and deriving results from them — can be overlooked in autosomal practice
without tragedy. In Y haplotype practice they are vital, and their persistent
absence from all early papers many recent ones results in nonsensical
recommendations and practice.
- Geographical clustering is a vital concern with Y haplotypes; in autosomal
work not so much.
- Y-haplotype in identification and paternity
I discuss here basic principles in using Y-haplotype information for identity or paternity.
Suppose suspect and crime stain have the same Y-chromosome haplotype. That result is normal
and expected (i.e. 100%) if the suspect is the donor; it is the probably of seeing the haplotype
among random men if the suspect is a random man.
The strength of the evidence is therefore simply expressed as matching odds (or equivalently
as a likelihood ratio) of
matching odds = 1 / Pr(haplotype).
- Paternity ordinary case
Typically father and son share a Y-haplotype just as if the son were a crime scene. Therefore
in the typical case the equation above also gives the paternity index:
PI = 1 / Pr(haplotype).
- Paternity mutation
Of course that's not 100% true; there are mutations. Available data supports that the
mutation rates and behavior for STR loci on the Y-chromosome are typical for the genome; so around
μ=1/400 per locus per generation for single step mutations, but with a lot of variation
depending on the locus.
Suppose a man M has Y-haplotype which we call YM and a boy C has the type
YC which differs from YM by a single step
at just one locus.
Obviously, mutation cannot be ignored in this case. Since μ
is the probability of any mutation, but nearly all (90-95%) STR mutations are one-step
and expansion and contraction are about equally common, to a reasonable approximation
the probability to mutate in either direction between YC
and YM is μ/2.
There are several possible approaches. We use the notation PI for the
paternity index, and
PI = X/Y, where
X = Prob(observed haplotypes | F father of C) and
Y = Prob(observed haplotypes | F unrelated to C).
To evaluate Y, we can write
Y = mc where
X is a little more problematic.
- child-centric approach
The child has YC, inherited from his
father. A mutation between
YC and YM may
have occurred, with probability μ/2.
Therefore, given that a child is type c the probability is
approximately μ/2 that his father is type
X = cμ/2 and
LR = X/Y = X/cu = 3μ/2u.
It remains to estimate u.
- father-centric approach
In a symmetrical way we could begin with the alleged father, and
obtain instead the formula
LR = 3μ/2c.
- Which approach is right? How to estimate c and/or u?
Deep questions. What is right depends on such things as what you think
the population database represents grandfather's generation?
the child's? If the population were in drift and mutation equilibrium,
then I suppose all methods would give the same answer.
- Pragmatic estimate of the Y-haplotype evidence
Note that all formulas are equivalent if c = u.
Therefore to be conservative let's take the uncle-centric view and
Hence LR = 30.009/2(2/171) = 1.15.
The meaning of this neutral result is that the chance to see so
rare a haplotype by mutation is about the same as the chance to see it
at random in an unrelated individual.
- Approach to "frequencies"
Frequencies of an unobserved trait is impossible to know. Fortunately frequency isn't the question.
My papers on rare haplotypes offer several approaches.
Bottom line: simple counting (but add 1) is very conservative.
A pretty accurate method that is not complicated is also given. June 2009
- Mitochondrial ID of a sister
Body B and reference sister S have mitotypes mTB
and mTS which differ only by one base, perhaps a mutation.
What is the LR supporting the hypothesis that B is the lost sister of S?
- Formulate mitochondria LR for identification
LR = X/Y where
X = Pr(sisters have types mTB and mTS),
Y = Pr(randomly chosen people have types mTB and mTS).
The difficulty in evaluating X is that we don't know which sister mitochondria represents a mutation.
We can take a mother-centric approach, considering that their common mother had one type or the other and assuming
that one of the sister mitochondria is a mutation. Then there are two possibilities for the mother type, so
X = Pr(mother=mTB and S is a product of mutation)
+Pr(mother=mTS and B is a product of mutation)
= Pr(mTB)μ + Pr(mTS)μ
= μ[Pr(mTB) + Pr(mTS)].
- Probability of type mTB
As part of the answer we will need mitochondrial random matching probabilities.
As an example evaluate
b=Pr(random person = mTB | mTB observed in body
Let's say mTB occurs x times in our reference database of N
mitotypes. Per AlleleProbability.htm,
b < (x+1)/(N+1)
(where the occurrences of +1 represent conditioning on the casework observation
We can improve on that by accepting the logic of the "κ method" according to
which b = λ(x+1)/(N+1).
Here λ is a factor less than one and independent of x.
(Specifically λ = 1 – κ, i.e. λ is the proportion of the N database mitotypes
observations that are not singletons.)
- Solving the LR expression
What Y is
Evaluating Y is easy. Under the assumption that B and S
are just unconnected random observations,
Y = Pr( ... )
ugh! Can NOT condition on the B & S types!
Go to top