The littlest database
The problem: |
Suppose you want to estimate allele frequencies
for some DNA locus. How big should the database be? Sometimes N=100
individuals (200 alleles) is suggested as a practical size. But
surely N=99 will do almost as well. And if that is so, why not N=98?
And so on. Naturally the utility gradually diminishes as N becomes
smaller. But for what value of N does the utility disappear
completely? What is the absolutely smallest database that is any use
at all? And what use is it? |
|
N=0 can be useful. Suppose that analysis of a crime stain reveals two
alleles, PQ. If a PQ suspect turns up, there is a definite amount of
evidence against him, even with no information about frequencies at
all. Reason: The alleles P and Q have some (unknown) frequency in
the population, call them p and q. Now,
(i) | p = ½ + (p- ½) | and |
| p+q ≤ 1 | so |
(ii) | q ≤ 1-p = ½ - (p- ½),
| hence multiplying together (i) and (ii) |
| 2pq ≤ 2(¼ - (p- ½ )2)
≤ ½, |
i.e. at most ½ the population is PQ. If we can
get the same result in 10 loci, then the suspect is narrowed down to
1 person in 1024 who matches the stain. Not bad for no databases!
Comments? Questions? Disputes?
Links: Forensic mathematics home page.
Posers in forensic mathematics.