This essay is based on my September 3 talk, and proceedings contribution, for the 2011 ISFG conference in Vienna Austria.
Keywords: forensic mathematics; forensic statistics; DNA mixture; rare haplotype; refereeing; misconduct
The paradigm should be: State the problem, formulate it mathematically, state premises (inevitably including a model since this is applied mathematics), justify the premises (i.e. validate the model), derive the result.
I tried to follow the mathematical tradition in my recent paper on rare haplotype evidence and (as I modestly define) the fundamental problem of forensic mathematics [FP]. The main problem is to find the evidentiary value of a previously unseen haplotype linking suspect to crime scene. Expressed mathematically it comes down to a conditional probability that a random innocent person will match. For the evidential likelihood ratio (the reciprocal of that probability) I derive the expression LR=n/(1-κ) where n=size of reference database after extending it with the crime scene type, and κ=proportion of that database that is singletons, which derivation is valid under a stated modelling condition. Then I show that the modelling condition holds, hence the formula is valid, for a wide range of theoretical populations which encompasses the plausible range of real populations.
No doubt my paper isn’t a perfect example of the paradigm, nor is it a unique example. But disappointingly, contrary examples abound.
A paper which simply gives a recipe for calculation without any stated justification (let alone careful justification) is professionally deficient. We find mixture [B-mix], [S-mix] and rare haplotype papers [S-Yhap] of this sort. Also, there is a prevalent style of sophisticated writing that is entertaining but literally pointless: e.g. “an attractive point of view” (Is it attractive to the innocent suspect who is the victim of the entertainment?), or “An alternative framework is to suppose ... a prior distribution that is conveniently taken to be Beta” [BKW]. Is convenience of statisticians a valid criterion for deciding who goes to jail?
In the ‘90s the exclusion method for mixtures was simple: If a suspect is “included” then report RMNE, calculated per-locus as the squared sum of the allele frequencies for alleles observed above 100rfu or so. No one actually wrote down the model but the formula is simple enough that it can be reverse-engineered to deduce what the model must be: The formula assumes that all alleles of a donor will be conspicuous (e.g. >100rfu) and “included” means all of ones alleles are conspicuous in the mixture. Obviously this is an absurd model. That it survived and was popular and accepted for years – perhaps still – proves the importance of explicitly writing down models and explicitly deriving and justifying the consequences. With nothing written down, nothing wrong is written down and errors are less obvious.
The recent appearance of a refined “exclusion” method [B-mix] [S-mix] suggests that RMNE enthusiasts have woken up to the folly and unfairness of the original approach. They have not, though, woken up to the importance of models, let alone to justifying the shibboleths of the RMNE faith such as “easy to understand!”, “needs no assumption about number of contributors!”, “conservative!” These papers give no mathematical analysis at all; only recipes which apparently we are supposed to trust.
And my point isn’t that the method fails. My point is that the adherents of a method have a positive responsibility to show why it works. They have not done a respectably professional scientific job if they don’t explain coherently. Otherwise the rest of us – reader, analyst in the laboratory, judge, and accused – ought to be suspicious of the validity of the method. And I am.
prima facie implausible —
The frequency surveying approach is founded on the tempting intuition that the frequency
of a haplotype is correllated with the frequency of its mutational neighbors. "Neighbors" are
defined by assuming a single-step mutations, a good model. The evolutionary model though is
tragically unstated. The evolutionary intuition
may be that neighboring haplotypes replenish one another by repeated mutations. That is
reasonable for a one or two locus haplotype, but when the number of loci is large —
even seven loci, let alone 17 as is usual today — convergent mutation is very unusual and
I do not see how the replenishing phenomenon can be a significant influence
compared to the effect of genetic drift. But the unstated model implicitly assumes the opposite,
that replenishment dwarfs drift.
Compared to that the fact that the weighting formula doesn't follow from the mutational model (The W formula, summing N/distance, comes from thin air, not mathematics. Why not N/distance2 or N/edistance? Random guesswork.), or any imaginable model, is a secondary objection. Even if there were no drift, the model would be wrong in giving vastly too much weight to non-immediate neighbors.
Finally and equally mysterious [BKW] falsely asserts that a simultaneous publication, [Ysurvey], validates an alternative approach, “frequency surveying.” There is nothing about validation in [Ysurvey]. An author of [Ysurvey] says that it includes no validation (M Anderson, pers comm). The method seems [FP] prima facie implausible. An author of [BKW] recalls that the assertion was added at the behest of the referee (B Weir, pers comm). If so the referee overlooked confused reasoning, an algebraic blunder and numerous factual errors in [BKW] but managed to join as anonymous co-author with a mistaken claim that could be important in a judicial setting. A judge might be falsely reassured with unjust effect. Or the defense, armed with the true and unflattering story, might argue convincingly that a breakdown in the peer-review process amounts to a shoddy lack of professionalism about statistical methods in forensic genetics as a whole. That’s harsh reasoning from one incident, but is the conclusion wrong?
Explain the problem first in words. This important step is difficult, but important in order to give reader and writer alike something to latch onto.
It requires careful thought to find the right words to say exactly what you are trying to do. If those words are missing, that doesn't indicate mere carelessness. Rather take it as a sign of vague and fuzzy thinking, and a warning of more to come.
If the words are carefully chosen it will be straightforward to give mathematical expression to the evidential value that is the goal.
I can imagine a reasonable forensic mathematics paper that gives a good argument for some method without giving a model, just based on statistical analysis of some experiments for example. But it would always be much better with a model. The model provides insight into why the suggested idea works, and it guides understanding into the range of circumstances under which the idea is valid.
By the way, the concern in particular for the innocent suspect isn't simply because we care more about them. It is because fairness to the innocent is almost certain, for a subtle but pervasive philosophical reason, to be more difficult than fair evaluation of evidence when the suspect is really guilty. A model is necessarily an idealization, an approximation to reality that omits some details. Omitting details means omitting evidence. For example, in the case of DNA mixture evidence, a simple model doesn't worry about signal intensity. When the suspect is really the donor to a mixture, the intensity information will in practice nearly always be further evidence so ignoring it is doing him a favor, being "conservative." But for an innocent suspect the tendency is exactly the opposite: the more we look at details the more we are likely to see things revealing the truth. That may be why it is very hard to find a simple method — a method based on a very much simplified model — that is fair to the innocent.
The above points seem pretty obvious; it is hard to imagine anyone arguing that it is ok to omit stating the problem. To see the vital importance of stating premises and giving a model, look at any paper which does not do so and see immediately how chaotic and difficult to evaluate it is. Even for a quite simple paper I find that as I lay out the mathematics explicitly, I am forced to refine my results and conclusions. So I am convinced that slapdash papers — i.e. most papers — are riddled with unsoundness. Those criticized in this essay and many (most?) others are nothing more than guesswork, and the more closely the guesses are examined the more it is apparent that shoddy reasoning leads not just to poorly supported conclusions, but to poor conclusions.
[B-mix] Budowle B, Onorato AJ, Callaghan TF, Della Manna A, et al,
Mixture interpretation: defining the relevant features for guidelines for the assessment of mixed DNA profiles in forensic casework, J Forensic Sci 2009; 54(4): 810–21
[S-mix] SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing
Laboratories §3.5. Interpretation of DNA Typing Results for Mixed Samples at
[S-Yhap] SWGDAM Y-chromosome Short Tandem Repeat (Y-STR) Interpretation Guidelines. (§5.3. The basis
for the haplotype frequency estimation is the counting method. at
[BKW] Buckleton JS, Krawczak M, Weir BS,
The interpretation of lineage markers in forensic DNA testing, FSI Genetics 5 (2011) 78-83
[Ysurvey] Willuweit S, Caliebe A, Anderson MM, et al,
Y-STR frequency surveying method: a critical reappraisal, FSI Genetics 5 (2011) 84-90