Table of contents

Haplotype DNA evidence
  1. Y-chromosome analysis
    1. identity
    2. paternity – ordinary case
    3. paternity – mutation
      1. child-centric approach
      2. father-centric approach
    4. Which is right?
    5. Pragmatic estimate
  2. Approach to "frequencies"
  3. Mitochondrial ID of a sister Feb2014
    1. Probability of type mTB
    2. Formulate mitochondria LR for identification
    3. Solving the LR expression

Analysis of Y-haplotype information in a kinship case
Forensic mathematics home page
Comments are welcome (see home page for email)

Haplotype DNA evidence

A Y-chromosome or a mitochondria has genetic information that is useful for identification or for kinship problems such as paternity attribution, but has to be treated somewhat differently from the more typical nuclear (=autosomal) DNA profile for many reasons:
  1. The genetic rules are simpler – the trait is either known to be passed or known not to be passed (depending on the sexes involved) to each offspring; there are no choices or 50% probabilities of transmission as with nuclear DNA.

    May2018, note added 5-10 years after initially writing this web page: This page discusses only how to handle the complication of relatives and possible mutation. It doesn't deal with the difficult questions of evaluating what I loosely call "Pr(haplotype)" below, or we'd have many further reasons to list:

  2. Several markers are linked, i.e. physically chained and inherited together, so they must be considered as a unit. No recombination. The product rule doesn't apply at all.
  3. Autosomal STR alleles matching probabilities or frequencies can reasonably be estimated as sample frequencies; for Y haplotypes that is far from true.
  4. When using population data to estimate the significance of an autosomal STR allele match, it's close to adequate to consider only the allele in question. For Y by contrast, it's important to look at the entire database.
  5. Putting confidence intervals on a matching probability is mathematically ignorant. However, while it's a fairly harmless error in the autosomal domain, it's a crippling mistake when dealing with Y haplotype matching probabilities.
  6. Obviously all men are related. Forgetting that when dealing with Y misses the fundamental point that nearly all Y matching is from identity by descent. That's not so with a single STR locus as in the autosomal situation.
  7. "Theta" — the chance of two alleles or haplotypes being IBD — in autosomal practice is a minor adjustment to the matching chance. With Y haplotypes it's the main thing; it's a good approximation to the matching chance and anything else is a minor adjustment.
  8. Explicit models — a careful mathematical approach laying out premises and deriving results from them — can be overlooked in autosomal practice without tragedy. In Y haplotype practice they are vital, and their persistent absence from all early papers many recent ones results in nonsensical recommendations and practice.
  9. Geographical clustering is a vital concern with Y haplotypes; in autosomal work not so much.

  1. Y-haplotype in identification and paternity
  2. I discuss here basic principles in using Y-haplotype information for identity or paternity.

    1. Identity
    2. Suppose suspect and crime stain have the same Y-chromosome haplotype. That result is normal and expected (i.e. 100%) if the suspect is the donor; it is the probably of seeing the haplotype among random men if the suspect is a random man.

      The strength of the evidence is therefore simply expressed as matching odds (or equivalently as a likelihood ratio) of

      matching odds = 1 / Pr(haplotype).

    3. Paternity – ordinary case
    4. Typically father and son share a Y-haplotype just as if the son were a crime scene. Therefore in the typical case the equation above also gives the paternity index:

      PI = 1 / Pr(haplotype).

    5. Paternity – mutation
    6. Of course that's not 100% true; there are mutations. Available data supports that the mutation rates and behavior for STR loci on the Y-chromosome are typical for the genome; so around μ=1/400 per locus per generation for single step mutations, but with a lot of variation depending on the locus.

      Suppose a man M has Y-haplotype which we call YM and a boy C has the type YC which differs from YM by a single step at just one locus.

      Obviously, mutation cannot be ignored in this case. Since μ is the probability of any mutation, but nearly all (90-95%) STR mutations are one-step and expansion and contraction are about equally common, to a reasonable approximation the probability to mutate in either direction between YC and YM is μ/2.

      There are several possible approaches. We use the notation PI for the paternity index, and
      PI = X/Y, where
      X = Prob(observed haplotypes | F father of C) and
      Y = Prob(observed haplotypes | F unrelated to C).

      To evaluate Y, we can write
      Y = mc where
      m=Prob(YM) and

      X is a little more problematic.

      1. child-centric approach
      2. The child has YC, inherited from his father. A mutation between YC and YM may have occurred, with probability μ/2. Therefore, given that a child is type c the probability is approximately μ/2 that his father is type YM.

        X = cμ/2 and
        LR = X/Y = X/cu = 3μ/2u.

        It remains to estimate u.

      3. father-centric approach
      4. In a symmetrical way we could begin with the alleged father, and obtain instead the formula
        LR = 3μ/2c.

    7. Which approach is right? How to estimate c and/or u?
    8. Deep questions. What is right depends on such things as what you think the population database represents – grandfather's generation? the child's? If the population were in drift and mutation equilibrium, then I suppose all methods would give the same answer.

      1. Pragmatic estimate of the Y-haplotype evidence
      2. Note that all formulas are equivalent if c = u. Therefore to be conservative let's take the uncle-centric view and take c=2/171.
        Hence LR = 3•0.009/2(2/171) = 1.15.

        The meaning of this neutral result is that the chance to see so rare a haplotype by mutation is about the same as the chance to see it at random in an unrelated individual.

  3. Approach to "frequencies"
  4. Frequencies of an unobserved trait is impossible to know. Fortunately frequency isn't the question. Probability is.

    My papers on rare haplotypes offer several approaches. Bottom line: simple counting (but add 1) is very conservative. A pretty accurate method that is not complicated is also given. June 2009

  5. Mitochondrial ID of a sister
  6. Feb2014
    Body B and reference sister S have mitotypes mTB and mTS which differ only by one base, perhaps a mutation. What is the LR supporting the hypothesis that B is the lost sister of S?

  7. Formulate mitochondria LR for identification
  8. LR = X/Y where
    X = Pr(sisters have types mTB and mTS),
    Y = Pr(randomly chosen people have types mTB and mTS).

    The difficulty in evaluating X is that we don't know which sister mitochondria represents a mutation. We can take a mother-centric approach, considering that their common mother had one type or the other and assuming that one of the sister mitochondria is a mutation. Then there are two possibilities for the mother type, so

    X = Pr(mother=mTB and S is a product of mutation) +Pr(mother=mTS and B is a product of mutation)
       = Pr(mTB)μ + Pr(mTS
       = μ[Pr(mTB) + Pr(mTS)].

    1. Probability of type mTB
    2. As part of the answer we will need mitochondrial random matching probabilities. As an example evaluate

      b=Pr(random person = mTB | mTB observed in body B).

      Let's say mTB occurs x times in our reference database of N mitotypes. Per AlleleProbability.htm, b < (x+1)/(N+1) (where the occurrences of +1 represent conditioning on the casework observation of mTB).

      We can improve on that by accepting the logic of the "κ method" according to which b = λ(x+1)/(N+1). Here λ is a factor less than one and independent of x. (Specifically λ = 1 – κ, i.e. λ is the proportion of the N database mitotypes observations that are not singletons.)

    3. Solving the LR expression
    4. What Y is

      Evaluating Y is easy. Under the assumption that B and S are just unconnected random observations,

      Y = Pr( ... )
      ugh! Can NOT condition on the B & S types!

Go to top