Careful Formulation of a Likelihood Ratio Statement based on Anecdotal Evidence

Table of contents

  1. The problem
    1. Background
    2. (anecdotal) example
  2. Analysis
    1. What is the evidence?
    2. What is the LR?
    3. Do the math
    4. Conclusion
      1. Interpretation
      2. An issue of bias
  3. Exercises
    1. Age coincidence
  4. LR shortfall or requisite prior

  1. The problem
    1. Background
    2. A DNA profile as evidence linking two things (i.e. suspect to crime scene) is a situation well suited to formulation of a likelihood ratio (LR) because the DNA profile is
      1. well defined — we know what the profile is and hence what a "match" means, and
      2. nicely quantifiable — most importantly we can compute with reasonable accuracy the probability of the random occurrence of a profile
      Other kinds of evidence are not so nice. Especially anecdotal evidence – non-scientific evidence – is likely to be deficient with respect to both of the above criteria. Still, I think it can sometimes be usefully treated in a Bayesian way. Inevitably we will have to guess the strength of the evidence partly through intuition. But still there's scope for exacting analysis. Namely, it's important to formulate accurately the questions to be answered, so that at least we apply our subjective intuition to the correct questions.

        ProBusqueda

        In our ProBusqueda work, we have many cases of young adults who were lost to their families in the chaos of military fighting in El Salvador, lost their birth identities (BI), were adopted, and now as young adults (YA) wish to reconnect with their birth family. Sometimes DNA is sufficient to make a confident link between YA and BI, but sometimes it falls short. Always there is other connecting evidence of various kinds in which case we would like to do the impossible and quantify it in order to make a good decision as to whether the non-DNA evidence is sufficient to make up the shortfall between the DNA evidence and confident identification.

    3. (anecdotal) example
    4. Young adopted adult "Carter" bears scars.
      Family of birth name/missing child "Nestor" mentions neck scars from vaccination & insect bites.

  2. Analysis
    1. What is the evidence?
    2. Is the evidence that
      Carter and Nestor both have scars?

      Let's be more precise. The evidence is that

      Carter has scars and relatives report that the infant Nestor had scars.

      More carefully, let's describe the evidence as EC & EN, where
      EC = Carter has scars of a certain description S.
      EN = Nestor's family describes scars of a certain description S' (similar to but certainly not identical description to S)

    3. What is the LR?
    4. Then the LR = X/Y where

      X = Pr(EC & EN | Carter is Nestor)
      Y = Pr(EC & EN | Carter is not Nestor).

    5. Do the math
    6. ... doesn't depend on the relationship The way people describe Carter is not biased by his true history (especially if unknown).

      Note: EC doesn't depend on the relationship, so Pr(EC | Carter is Nestor) = Pr(EC | Carter is not Nestor) = Pr(EC).

      Consider X. There are two ways to apply the identity Pr(F & G) = Pr(F)Pr(G|F). In this case I choose an infant-centric formulation by letting EC play the role of F:
      X= Pr(EC & EN | Carter is Nestor)
      = Pr(EC | Carter is Nestor) Pr(EN | Carter is Nestor & EC)
      = Pr(EC) Pr(EN | Carter is Nestor & EC).

      And Y:
      Y = Pr(EC & EN | Carter is not Nestor)
      = Pr(EC | Carter is not Nestor) Pr(EN | Carter is not Nestor & EC)
      = Pr(EC) Pr(EN | Carter is not Nestor & EC).

      So LR=Pr(EN | N*) / Pr(EN | ~N*), where
      N* = Nestor is a childhood version of Carter, a person who bears scars S.

      ... apply the identity The mathematical rule for the probability of a conjunction in terms of the probabiltiies of the constituents.

      Example: The probability that a person is F=over fifty and G=unemployed can be computed if you know

      • Pr(F) = proportion of people over fifty
      • Pr(G|F) = proportion of unemployed among those over fifty.
      Now just multiply.

      Alternatively, it could be calculated from

      • Pr(G) = proportion of unemployed
      • Pr(F|G) = proportion of over-fifties among those unemployed.
      Now multiply.

    7. Conclusion
    8. Let's rephrase:
      Evidence EN = Relatives remember and report scars S' for the child Nestor
      hypothesis H1 = the child is the same person as the adult who now has scars S
      hypothesis H0 = the child is random person
      LR = Pr(EN | H1) / Pr(EN | H0).
      or in words: How many times more probable is EN when H1 is true, than when H0 is true?

      1. Interpretation
      2. The LR is thus a comparison of these two probabilities:
        Pr(EN | H1) = The probability that relatives of a person with scars S would report childhood scars S'
        versus
        Pr(EN | H0) = The probability that relatives of some random infant would recall childhood scars S'.
        No doubt the first probability is larger than the second one, hence LR>1. But unless the scars are fairly striking – obviously unusual – or the coincidence between the descriptions S and S' very strong, I would be reluctant to conclude that LR is very large.

      3. An issue of bias
      4. It's not clear, from the story I've presented, how the testimony from the relatives came about. It could be that the relatives spoke of the scars S' after being prompted in some way. That is, suppose that the actual evidence is a biased version EN' of EN:
        Evidence EN' = Relatives remember and report scars S' for the child Nestor after being given a suggestion in the form of information or a picture of Carter.
        It may be that the bias makes the denominator much larger, i.e. that Pr(EN' | H0) >> Pr(EN | H0). (Maybe numerator too, but probably less so.) It may be that a very high percentage of children might be described as having scars if the survey is in the form of a leading question. The way the information is collected may have a big effect and consequently the actual LR from the anecdote may be not very big, not very helpful in concluding identity.

  3. Exercises
    1. Age coincidence
    2. As an exercise, consider the following data: Hypothesis H1 = the child is the same person as the adult who believes EC
      Hypothesis H0 = the child is random person

      We have some survey data based on 698 missing children:

      If Carter is not Nestor, we can assume that his possible birth identities are represented by some part of the sample of 698. Depending on how much vagueness about dates and ages we accept, there are from 19 to 114 out of those 698 that are consistent with his belief. Therefore the LR supporting identity based on this data is

      6 ≤ LR ≤ 36.

  4. LR shortfall or requisite prior
  5. Let's consider the above analyses in this context: Can we do it? Suppose we figure Then to bring LRother ≥ 2564, we need to believe that

    LRscar 2564 / (2 × 12) = 107. That's believable, but it's not obvious! Might depend on the details.


Go to top