Forensic mathematics glossary
Not really a glossary; just a collection of miscellaneous words,
not even including the most important ones (just use a search for those).
In fact the intent isn’t even definitions, often just comments or links.
- avuncular index
- is a name I coined
for the likelihood ratio that supports a tested man to be the uncle of a
tested child rather than unrelated to the child.
The AI is useful to help settle the question of paternity when
the DNA profiles are consistent with paternity at nearly all loci but
“inconsistent” at a few loci.
In that situation it may be either that
Conflicting explanations — in one case the man is the father, in the other
case he is not. Comparing the PI and AI
can help in choosing between the possibilities.
- The man is the father (the “inconsistencies” being mutations), or
- The man is the uncle (or other close relative of the father).
- The term “inconsistency” is often used to mean genotypes for
alleged father (AF), child, and perhaps mother, at a particular locus that is inconsistent with
paternity barring mutation (and perhaps barring null alleles — usage
varies). For example the pattern child=(14,17), AF=(15) is “inconsistent”.
- Obviously an “inconsistentcy” isn’t
literally inconsistent with paternity because mutation is
always a possibility.
- Moreover for many relationships other than parentage, for example
two siblings (or grandparent and grandchild),
all genotypes are possible even barring mutation,
hence the “inconsistency” concept makes no sense at all.
- baseline prior probability
- consanguinous mating — see incest
- disaster identification
- exclude, exclusion
- frequently misused word. Consider “We excluded the man of paternity because of three exclusions.”
See What’s wrong with the “exclusion probability” for a start.
- The concept, while appealing, is dubious as well. Don’t get me started.
- The meaning of “contributor” to a mixture
depends on the meaning of mixture.
- Contributor to a mixture consisting of
DNA molecules (mixture substance) is best defined
in vague terms such as “someone who contributed a
‘significant’(?) amount of DNA”.
- Contributor in the sense of
mixture data is also hard to define in simple terms,
but it must relate to how including a (known or unknown) DNA profile
affects the computed likelihood of the mixture data.
- frequency spectrum
- The probability distribution of (allele or haplotype) frequencies that occur.
For example, for YFiler haplotypes
the spectrum shows a very high probability of haplotype frequencies between 0.0001 and 0.0002
(many haplotypes have population frequencies in this range), a much lower probability of
haplotype frequencies between 0.0005 and 0.0006.
Theory or data may suggest an expected frequency spectrum, which can then be regarded as
prior probabilities for the frequencies of allelic or haplo-types. Such a prior can be used
along with Bayes’ theorem and a sample reference database of allelic types,
to infer allele probabilities.
Brenner’s Law is a statement about the frequency spectrum for
forensic STR loci.
- The terms prosecutor’s fallacy
and defense fallacy were, I think, invented by
UC Irvine law Prof. William Thompson.
- Ebenezer, the father of Judy, is alleged to be the father of Judy’s child.
What modification to the normal paternity calculation is appropriate?
Answer — None, normally. So long as the testing
uses unlinked co-dominant markers (like standard forensic STRs) and both the mother
and the alleged father are tested (normal trio paternity), the fact of the adults’
relationship is irrelevant.
- mass identification;
mass disaster identification
- a DNAVIEW speciality.
- likelihood ratio (LR)
- The central concept of Forensic Mathematics.
The way to quantify forensic — or any — evidence.
Any other method is either equivalent to a likelihood ratio or is nonsense.
The framework we consider is trying to judge between two possible hypotheses. A typical pair of hypotheses would be:
- The suspect is the donor of the rape kit semen, versus the suspect is a random man;
- The tested body B is the missing relative of family F, versus B and F are unrelated.
Evidence, i.e. information or data such as DNA profiles, may be better explained by
one of the hypotheses than by the other.
- The word mixture is used carelessly and ambiguously
in forensic DNA.
Software analyzes mixture data.
- Mixture is more usefully defined as
“a combination of one or more”
than as “two or more”.
- Mixture might refer to either of
- mixture substance —
mixture of biological material, DNA
- mixture data —
lists and numbers (typically an annotated EPG) obtained from processing
Implication: Since the term
“ground truth” of a mixture
is a description of mixture substance,
it has limited relevance to
In particular, analysis by Mixture Solution
or other software that (for example) estimates number of contributors,
is not right or wrong according to whether it agrees with ground truth.
It’s a tortuous route from the moment DNA is deposited as
mixture substance to the final version of
It’s not a paradox that the best possible analysis of mixture data,
the most correct analysis, sometimes differs from the ground truth about something else.
- paternity index
- a quaint synonym for likelihood ratio used in the context of
- JS Mill explained it well.
- vs "frequency"
Allele probability and allele frequency are two different things.
Usually people say allele “frequency” when they mean allele probability.
See Why the quotes on “frequency”.
- prior probability
A probability summarizing the value of the evidence prior to inclusion of
(for our purposes) the DNA (i.e. scientific) evidence.
The prior probability is therefore — at least in principle —
a subjective assessment of anecdotal and other evidence that cannot be comfortably
quantified. It’s worth distinguishing several particular situations:
- criminal case
The prior probability is the evidential value of evidence (such as testimony, documents,
demeanor) that the DNA analyst doesn’t even hear, and which in any case is the
responsibility of the judge or jury to assess.
Therefore it is clearly wrong for an expert witness to intrude on the prerogative of the court by
making any prior probability assumption.
I don’t see anything wrong though with advising the court about how mathematics works,
such as by a picture or a chart of examples.
- (civil) paternity case
In principle the above applies to any court action. But
this essay tries to take a realistic view.
- mass disaster identification
Links: WTC and tsunami
When the object of the identification is humanitarian it is typically
virtually the case that decision making is deferred to the scientists.
In that case, two useful concepts are:
- baseline prior probability
If n people are equally missing and all bodies are indistinguishable
from one another,
then the prior probability for any particular identity is 1/n.
This baseline prior can be a reasonable starting point in realistic
scenarios as well.
- requisite prior
Suppose a DNA likelihood ratio has been determined, and a posterior
probability threshold for identification has been agreed. Then I define the
requisite prior as that prior which is just sufficient for declaring
Example: Suppose n=1000 missing,
LR=80000 supporting corpse V to be missing person Jim Jones, and that the
agreed policy is to declare identification when the probability is at
Using the baseline prior probability of 1/1000, the posterior
probability is only 98.8% and the threshold is not achieved.
A prior of about 1/81, 12-fold larger than
the baseline prior, is the requisite prior to obtain 99.9%.
Maybe there is some quite useful non-DNA evidence supporting the ID,
for example the body V shares a surgical scar
approximately like Jim Jones is known to have had,
and the stature is about right as well. It would be hard to estimate the
exact evidential value of those coincidences, but we don’t have to.
It is sufficient to judge that they are worth at least
LR=12, the shortfall from the DNA evidence.
another example and discussion
of requesite prior
- Everyone is related, so why do we say for example “The suspect is either the father,
or is unrelated to the child” to describe the
alternative hypotheses in a paternity question? Two answers:
All that said, when I or anyone says “unrelated” there is a good chance that what we really
mean is “randomly selected, with no more and no less than a random chance to have any
particular close or distant relationship”.
- It’s hard to find an accurate wording that is simple;
- The fiction of literally being unrelated is sometimes a premise of our computational model.
(The paternity formula 1/2q rests on a premise of unrelatedness.)
Forensic mathematics home page