Y-chromosome analysis

identity
paternity – ordinary case
paternity – mutation

child-centric approach
father-centric approach

Which is right?
Pragmatic estimate

Approach to "frequencies"
Mitochondrial ID of a sister Feb2014

Probability of type mT_B
Formulate mitochondria LR for identification
Solving the LR expression

Analysis of Y-haplotype information in a kinship case
Forensic mathematics home page
Comments are welcome (see home page for email)

Haplotype DNA evidence

A Y-chromosome or a mitochondria has genetic information that is useful for identification or for kinship problems such as paternity attribution, but has to be treated somewhat differently from the more typical nuclear (=autosomal) DNA profile for many reasons:

The genetic rules are simpler – the trait is either known to be passed or known not to be passed (depending on the sexes involved) to each offspring; there are no choices or 50% probabilities of transmission as with nuclear DNA.
May2018, note added 5-10 years after initially writing this web page: This page discusses only how to handle the complication of relatives and possible mutation. It doesn't deal with the difficult questions of evaluating what I loosely call "Pr(haplotype)" below, or we'd have many further reasons to list:
Several markers are linked, i.e. physically chained and inherited together, so they must be considered as a unit. No recombination. The product rule doesn't apply at all.
Autosomal STR alleles matching probabilities or frequencies can reasonably be estimated as sample frequencies; for Y haplotypes that is far from true.
When using population data to estimate the significance of an autosomal STR allele match, it's close to adequate to consider only the allele in question. For Y by contrast, it's important to look at the entire database.
Putting confidence intervals on a matching probability is mathematically ignorant. However, while it's a fairly harmless error in the autosomal domain, it's a crippling mistake when dealing with Y haplotype matching probabilities.
Obviously all men are related. Forgetting that when dealing with Y misses the fundamental point that nearly all Y matching is from identity by descent. That's not so with a single STR locus as in the autosomal situation.
"Theta" — the chance of two alleles or haplotypes being IBD — in autosomal practice is a minor adjustment to the matching chance. With Y haplotypes it's the main thing; it's a good approximation to the matching chance and anything else is a minor adjustment.
Explicit models — a careful mathematical approach laying out premises and deriving results from them — can be overlooked in autosomal practice without tragedy. In Y haplotype practice they are vital, and their persistent absence from all early papers many recent ones results in nonsensical recommendations and practice.
Geographical clustering is a vital concern with Y haplotypes; in autosomal work not so much.

Y-haplotype in identification and paternity

Identity

The strength of the evidence is therefore simply expressed as matching odds (or equivalently as a likelihood ratio) of

matching odds = 1 / Pr(haplotype).

Paternity – ordinary case

PI = 1 / Pr(haplotype).

Paternity – mutation

Suppose a man M has Y-haplotype which we call Y_M and a boy C has the type Y_C which differs from Y_M by a single step at just one locus.

Obviously, mutation cannot be ignored in this case. Since μ is the probability of any mutation, but nearly all (90-95%) STR mutations are one-step and expansion and contraction are about equally common, to a reasonable approximation the probability to mutate in either direction between Y_C and Y_M is μ/2.

There are several possible approaches. We use the notation PI for the paternity index, and
PI = X/Y, where
X = Prob(observed haplotypes | F father of C) and
Y = Prob(observed haplotypes | F unrelated to C).

To evaluate Y, we can write
Y = mc where
m=Prob(Y_M) and
c=Prob(Y_C).

X is a little more problematic.

child-centric approach

Y_C

Y_M

Hence
X = c•μ/2 and
LR = X/Y = X/cu = 3μ/2u.

It remains to estimate u.

father-centric approach

3μ/2c

Which approach is right? How to estimate c and/or u?

Pragmatic estimate of the Y-haplotype evidence

Note that all formulas are equivalent if c = u. Therefore to be conservative let's take the uncle-centric view and take c=2/171.

Hence LR = 3•0.009/2(2/171) = 1.15.

The meaning of this neutral result is that the chance to see so rare a haplotype by mutation is about the same as the chance to see it at random in an unrelated individual.

Approach to "frequencies"

My papers on rare haplotypes offer several approaches. Bottom line: simple counting (but add 1) is very conservative. A pretty accurate method that is not complicated is also given. June 2009

Mitochondrial ID of a sister

Feb2014

mT_B

mT_S

Formulate mitochondria LR for identification

LR = X/Y where
X = Pr(sisters have types mT_B and mT_S),
Y = Pr(randomly chosen people have types mT_B and mT_S).

The difficulty in evaluating X is that we don't know which sister mitochondria represents a mutation. We can take a mother-centric approach, considering that their common mother had one type or the other and assuming that one of the sister mitochondria is a mutation. Then there are two possibilities for the mother type, so

X = Pr(mother=mT_B and S is a product of mutation) +Pr(mother=mT_S and B is a product of mutation)
= Pr(mT_B)μ + Pr(mT_S)μ
= μ[Pr(mT_B) + Pr(mT_S)].

Probability of type mT_B

b=Pr(random person = mT_B | mT_B observed in body B).

Let's say mT_B occurs x times in our reference database of N mitotypes. Per AlleleProbability.htm, b < (x+1)/(N+1) (where the occurrences of +1 represent conditioning on the casework observation of mT_B).

We can improve on that by accepting the logic of the "κ method" according to which b = λ(x+1)/(N+1). Here λ is a factor less than one and independent of x. (Specifically λ = 1 – κ, i.e. λ is the proportion of the N database mitotypes observations that are not singletons.)

Solving the LR expression

What Y is

Y = Pr( ... )
ugh! Can NOT condition on the B & S types!

Go to top

Table of contents