What's wrong with the "exclusion probability"revised and extended 4 November 1997
Forensic mathematics index
Mixed stain exclusion
I'll try to explain why the likelihood ratio L is the right statistic to give in a paternity case, and the power of exclusion should not be given in addition or instead.
A = Probability of exclusion
(the probability, given the mother
and child results, that a non-father would be excluded from paternity
by this set of tests)
W = Probability of paternity
L = Paternity index = X/Y.
X = probability to see the M, C, and AF types assuming paternity.
Y = probability to see the M, C, and AF types assuming non-paternity.
For the sake of argument I will assume a 50% prior probability of paternity. On another day I would argue against that assumption, or any assumption, but for today it will make the discussion simpler, and will not do any harm.
Under that assumption, W=L/(L+1) and L=W/(1-W). Either of W or L can be computed from the other one. Thus, they convey the same information. So, just for today, I won't argue that one is better or more appropriate than the other.
On the other hand, A contains less information than W or L. For a given paternity case, it would be silly to give A when L is available (and it is).
I claim that A contains less information. Let me explain exactly how. Let's assume that the man is not excluded -- otherwise there is no need for statistics at all.
Usually the "evidence" means:
1. blood types for M
2. blood types for C
3. blood types for AF.
From this information, anybody can easily infer
4. AF is not excluded.
The point is that L is a summary of the information in 1, 2, and 3, (and therefore also includes 4) whereas A is a summary of the weaker information only of 1, 2, and 4. (4 is a little bit weaker mostly because it doesn't tell whether the man has one or two possibly paternal alleles in common with the child.)
Suppose I am the lab director, and my assistant who does the lab work refuses to tell me 3, but only gives 1, 2, and 4. Under this limited definition of "evidence" I could compute a likelihood ratio LA and a probability of paternity WA, which turns out (see Morris in Walker 1983) to be:
LA = 1/(1-A),
WA = LA/(1+LA) = 1/(2-A). note >>>>
Note: WA is normally
about equal to W, since they estimate the same thing (but from slightly
different versions of the evidence).
And a little algebra shows that
WA = 1/(2-A) = A + (1-A)2 / (2-A),
which shows that WA is also about equal to A. This explains why W and A are about the same.
If I have no choice, then I will report WA or LA as my statistical summary of the evidence (or I can report A, which has the same information).
But, if my lab tech later relents and tells me the whole story, then of course I should make the best computation I can with the increased evidence, and that is W or L. I would not also include WA or LA or A in the report, for the same reason that the company financial officer, when he gives the annual financial statement, would not include an earlier draft version that he prepared based on tentative and incomplete information.
(i) The height of the person from ground to shoulder.
(ii) The length of the person's arm.
So the total, (i)+(ii), tells me how high the person can reach. This is the best statistic, and is analogous to L or W. The first statistic, (i), by itself may be helpful -- it is like A. But if you can know (i)+(ii) there is no advantage at all in also knowing (i).
By careless use of language, people often refer to a test result as an "exclusion" if the result is inconsistent with paternity.
This is careless because obviously excluding the man from paternity is a decision that is made by people on the basis of evidence; it isn't the evidence itself.
The distinction is material, not just semantics. Because of the possibility of a mutation, most laboratories won't issue an opinion of "paternity excluded" unless there are at least two (typically but correct point of view is of course likelihood ratios, not counting "exclusions" at all) tests that have results inconsistent with paternity.
For that reason, the statistic as normally quoted is dishonest. When a lab claims
The exclusion chance is 99.8%they have invariably made a calculation that reflects the possibly of one or more inconsistent results. The true chance to exclude is smaller because, in at least some of the cases with only one inconsistent result, they would in fact not issue an opinion of "exclusion."
That is, when you state #1 even if you don't write #2 and #3 they are implied; otherwise #1 is non-probative and therefore misleading. In the single-inconsistency case, #2 is false. Therefore is would be dishonest to cite the RMNE.
Now, I realize that #2 is correct in the sense that a singly inconsistency does not in fact constitute disqualification for paternity. But the logical problem exists because the calculation in #1 is made on the assumption that it does. In other words, it is a bogus calculation. We could I suppose substitute a different (more complicated but do-able) computation RMNE* which computes the fraction of random non-fathers who would have zero OR ONE inconsistency. That would be a little more honest though still unsatisfactory.
To begin with, you'd have to use the RMNE* calculation (the calculation based on
tolerating one inconsistency) all the the time, including for those cases
where the man happens to have no inconsistencies.note >>>
The "exclusion" statistic is also an unreal and artificial statistic in that it has at least one bizarre and unreasonable property, as follows.
Suppose the child and mother are both type Q, and consider the possibility that an alleged father is type R. In the old days we used to call this situation an "indirect exclusion" or "apparent opposite homozygosity." The word "apparent" expresses the idea that there may conceivably be a "blank" or silent or unobservable allele O in the genetic system under consideration.
Note that, depending on whether or not you believe in the existence of blank alleles, there are two different formulas for the "probability of exclusion":
|no blank allele||1 - q2-2q(1-q), or equivalently, 1 - q(2-q) note just below|
|blank allele||1 - h - 2q(1-q)|
Now, what is peculiar about the above situation is that the actual frequency of the blank allele occurs nowhere in either formula. If you believe there is no blank allele at all, you use the first formula. If you get word that there is even a single blank allele in the world, even two continents away, then you switch to the second formula.
Consequently, the value of A is a discontinuous function of the blank allele frequency. This is an ordinary situation in pure mathematics, but is an impossible situation in nature. Nature is never discontinuous.
Therefore the "exclusion probability" is not a description of nature.
The ideas above are pretty old hat, which I understood perfectly well in 1982 (Brenner 1983). In recent years several sources, such as NRC I, the court in the OJ Simpson case, the FBI, and R. Chakraborty, have recommended using the probability of exclusion to summarize the evidence in such cases. I am a little embarrassed to admit that I didn't realize immediately that the issues and reasoning in the mixed stain situation are exactly the same as the familiar reasoning for the paternity situation that I have described above, and that therefore it is a bad idea for exactly the same reasons.
Here's a comparison of various kinds of problems, and the inappropriateness of using the exclusion probability in each case: