Careful Formulation of a Likelihood Ratio Statement based on Anecdotal Evidence

The problem
1. Background
2. (anecdotal) example
Analysis
Exercises
1. Age coincidence
LR shortfall or requisite prior

The problem

Background

well defined — we know what the profile is and hence what a "match" means, and
nicely quantifiable — most importantly we can compute with reasonable accuracy the probability of the random occurrence of a profile

Other kinds of evidence are not so nice.

Namely, it's important to formulate accurately the questions to be answered, so that at least we apply our subjective intuition to the correct questions.

ProBusqueda

(anecdotal) example

Analysis

What is the evidence?

YA Carter and BI Nestor both have scars?

Let's be more precise. The evidence is that

YA Carter has scars and relatives report that the infant BI Nestor had scars.

More carefully, let's describe the evidence as E_C & E_N, where

E_C = YA Carter has scars of a certain description S.
E_N = BI Nestor's family describes scars of a certain description S' (similar to but certainly not identical description to S)

What is the LR?

X = Pr(E_C & E_N | Carter is Nestor)
Y = Pr(E_C & E_N | Carter is not Nestor).

Do the math

Note: E_C doesn't depend on the relationship, so Pr(E_C | Carter is Nestor) = Pr(E_C | Carter is not Nestor) = Pr(E_C).

Consider X. There are two ways to apply the identity Pr(F & G) = Pr(F)Pr(G|F). In this case I choose an adult-centric formulation by letting E_C play the role of F:

X = Pr(E_C & E_N | Carter is Nestor)
= Pr(E_C | Carter is Nestor) Pr(E_N | Carter is Nestor & E_C)
= Pr(E_C) Pr(E_N | Carter is Nestor & E_C).
(adult-centric in that the probability of the YA appearance is evaluated unconditionally — Pr(E_C) — and the probability of the BI appearance is considered conditionally on the YA appearance)

And Y:

Y = Pr(E_C & E_N | Carter is not Nestor)
= Pr(E_C | Carter is not Nestor) Pr(E_N | Carter is not Nestor & E_C)
= Pr(E_C) Pr(E_N | Carter is not Nestor & E_C).

So LR=Pr(E_N | N*) / Pr(E_N | ~N*), where
N* = Nestor is a childhood version of Carter, a person who bears scars S.

... apply the identity The mathematical rule for the probability of a conjunction in terms of the probabilities of the constituent events.
Example: The probability that a person is F=over fifty and G=unemployed can be computed if you know

Pr(F) = proportion of people over fifty

Pr(G|F) = proportion of unemployed among those over fifty.
Now just multiply.
Alternatively, it could be calculated from

Pr(G) = proportion of unemployed
Pr(F|G) = proportion of over-fifties among those unemployed.
Now multiply.

Conclusion

Evidence E_N	= Relatives remember and report scars S' for the child Nestor
hypothesis H₁	= the child is the same person as the adult who now has scars S
hypothesis H₀	= the child is random person
LR	= Pr(E_N \| H₁) / Pr(E_N \| H₀).

or in words:

How many times more probable is E_N when H₁ is true, than when H₀ is true?

Interpretation

The LR is thus a comparison of these two probabilities:

	Pr(E_N \| H₁)	= The probability that relatives of a person with scars S would report childhood scars S'
versus
	Pr(E_N \| H₀)	= The probability that relatives of some random infant would recall childhood scars S'.

An issue of bias

E_N'

E_N

Evidence E_N' =

Relatives remember and report scars S' for the child Nestor after being given a suggestion in the form of information or a picture of Carter.

E_N'

H₀

E_N

H₀

Exercises

Age coincidence

E_N: BI Nestor was born in 1977 and disappeared in 1982.
E_C: YA Carter's belief that he was born about 1978 and was adopted in 1982

H₁

E_C

H₀

We have some survey data based on 698 missing children:

19 children aged 5 disappeared in 1982. (That's 1 in 36 of the missing.)
58 children aged 4–6 disappeared in 1982. (1 in 12 of the missing.)
114 children aged 4–6 disappeared in 1981-1983. (1 in 6 of the missing.)

... are represented by some part of the sample of 698 "Represented" doesn't mean Carter would be among the 698 in the sample, but only that, as the sample represents the 30,000 or so missing children, the missing children with Carter's age and time of disappearance are proportionally represented in it.
If Carter is not Nestor, we can assume that his possible birth identities are represented by some part of the sample of 698. Depending on how much vagueness about dates and ages we accept, there are from 19 to 114 out of those 698 that are consistent with his belief. Therefore the LR supporting identity based on this data is

6 ≤ LR ≤ 36.

LR shortfall or requisite prior

We estimate 30000 missing children altogether. Therefore we start with a "baseline prior probability" of 1/30000 that Carter=Nestor.
We would like to maintain a standard of 99.9% (posterior) probability supporting an identity in order to assume it.
Since 99.9% is 1000:1 odds and 1/30000 probability is 1:30000 odds and
(posterior odds) = (prior odds) × LR
the necessary total evidence LR_T that we require must satisfy
LR_T = (posterior odds) ÷ (prior odds) ≥ 1000 ÷ (1/30000) = 30,000,000.
Considering the total evidence LR_T to be composed of two factors, scientific (meaning DNA) and anecdotal (everything else), we have

LR_T = LR_DNA × LR_other
= 11,700 × LR_other
≥ 30,000,000, so we need
LR_other ≥ 30,000,000/11,700 = 2564.

_other

LR shortfall thinking

requisite prior thinking

LR_other ≥ 2564 means that the DNA evidence leaves us with a LR shortfall of 2564.

Can we do it? Suppose we figure

LR_other = LR_sex × LR_age × LR_scar, where
LR_sex = 2^(note) since Nestor was male as would be expected if he is Carter
LR_age = 12 (say), from the age and date discussion above
LR_scar represents the evidence from the coincidence about scarring.

Then to bring LR_other ≥ 2564, we need to believe that

LR_scar ≥ 2564 / (2 × 12) = 107.

We could say that the LR shortfall before consideration of the scar is 107.

Is LR_scar ≥ 107? That's believable, but it's not obvious! Might depend on the details.

^(note)

LR_sex — why it is 2

evidence E = gender of BI is same as gender of YA.

LR =	Pr(E \| BI = YA) Pr(E \| BI unrelated to YA)
=	1/(proportion of the population with the gender of YA)
=	1/½ (suppose)
=	2.

The "requisite prior" thinking means to consider the "other" evidence before the DNA, and wrap it into the prior odds.

First, consider how much the prior odds must be in order that the evidence DNA be sufficient.
Our policy is to require (posterior probability) ≥ 99.9%, i.e. (posterior odds) ≥ 1000.
Since we know LR_DNA = 11,700 and
(because (posterior odds) = (prior odds) × LR), the requisite prior odds for identification given LR_DNA=11,700 are
(requisite prior odds) = (posterior odds) / LR_DNA
(requisite prior odds) = 1000 / 11700
(requisite prior odds) = 1/11.7
Starting from the baseline prior = 1/30,000, can we justify that requisite prior?
sex

1/30000 is the odds prior to considering the sex data, and LR_sex=2.
(odds posterior to sex)=(odds prior to sex) × LR_sex
(odds posterior to sex)=(1/30000)×2=1/15000

age

1/15000 is the odds posterior to sex, and prior to considering age. Say LR_age=12.
(odds posterior to age)=(odds prior to age) × LR_age
(odds posterior to age)=(1/15000)×12=1/1250

scar

1/1250 is the odds prior to considering the scar.
(odds posterior to scar, prior to DNA)=(odds prior to scar) × LR_scar.
We must have (odds posterior to scar, prior to DNA) ≥ (requisite prior) = 1/11.7, so
(odds prior to scar) × LR_scar ≥ 1/11.7, i.e.
1/1250 × LR_scar ≥ 1/11.7, which entails
LR_scar ≥ 1250/11.7 = 107 as before.
That is, given our policy and our other assumptions, we need LR_scar ≥ 107 to declare the identification.

Go to top

	E_C =	YA Carter has scars of a certain description S.
	E_N =	BI Nestor's family describes scars of a certain description S' (similar to but certainly not identical description to S)

LR_T	= LR_DNA × LR_other
	= 11,700 × LR_other
	≥ 30,000,000, so we need
LR_other	≥ 30,000,000/11,700 = 2564.