# Forensic mathematics glossary

Not really a glossary; just a collection of miscellaneous words, not even including the most important ones (just use a search for those). In fact the intent isn’t even definitions, often just comments or links.
avuncular index (AI)
is a name I coined for the likelihood ratio that supports a tested man to be the uncle of a tested child rather than unrelated to the child.

The AI is useful to help settle the question of paternity when the DNA profiles are consistent with paternity at nearly all loci but “inconsistent” at a few loci. In that situation it may be either that

1. The man is the father (the “inconsistencies” being mutations), or
2. The man is the uncle (or other close relative of the father).
Conflicting explanations — in one case the man is the father, in the other case he is not. Comparing the PI and AI can help in choosing between the possibilities.

inconsistent, inconsistency
The term “inconsistency” is often used to mean genotypes for alleged father (AF), child, and perhaps mother, at a particular locus that is inconsistent with paternity barring mutation (and perhaps barring null alleles — usage varies). For example the pattern child=(14,17), AF=(15) is “inconsistent”.
Notes:
• Obviously an “inconsistentcy” isn’t literally inconsistent with paternity because mutation is always a possibility.
• Moreover for many relationships other than parentage, for example two siblings (or grandparent and grandchild), all genotypes are possible even barring mutation, hence the “inconsistency” concept makes no sense at all.
baseline prior probability

consanguinous mating — see incest

disaster identification

exclude, exclusion
frequently misused word. Consider “We excluded the man of paternity because of three exclusions.” See What’s wrong with the “exclusion probability” for a start.
The concept, while appealing, is dubious as well. Don’t get me started.

contributor
The meaning of “contributor” to a mixture depends on the meaning of mixture.
1. Contributor to a mixture consisting of DNA molecules (mixture substance) is best defined in vague terms such as “someone who contributed a ‘significant’(?) amount of DNA”.
2. Contributor in the sense of mixture data is also hard to define in simple terms, but it must relate to how including a (known or unknown) DNA profile affects the computed likelihood of the mixture data.
frequency spectrum
The probability distribution of (allele or haplotype) frequencies that occur. For example, for YFiler haplotypes the spectrum shows a very high probability of haplotype frequencies between 0.0001 and 0.0002 (many haplotypes have population frequencies in this range), a much lower probability of haplotype frequencies between 0.0005 and 0.0006.

Theory or data may suggest an expected frequency spectrum, which can then be regarded as prior probabilities for the frequencies of allelic or haplo-types. Such a prior can be used along with Bayes’ theorem and a sample reference database of allelic types, to infer allele probabilities.

Brenner’s Law is a statement about the frequency spectrum for forensic STR loci.

fallacy
The terms prosecutor’s fallacy and defense fallacy were, I think, invented by UC Irvine law Prof. William Thompson.

incest
Ebenezer, the father of Judy, is alleged to be the father of Judy’s child. What modification to the normal paternity calculation is appropriate?
Answer — None, normally. So long as the testing uses unlinked co-dominant markers (like standard forensic STRs) and both the mother and the alleged father are tested (normal trio paternity), the fact of the adults’ relationship is irrelevant.

mass identification; mass disaster identification
a DNA•VIEW speciality.

likelihood ratio (LR)
The central concept of Forensic Mathematics. The way to quantify forensic — or any — evidence. Any other method is either equivalent to a likelihood ratio or is nonsense.

The framework we consider is trying to judge between two possible hypotheses. A typical pair of hypotheses would be:

1. The suspect is the donor of the rape kit semen, versus the suspect is a random man;
2. The tested body B is the missing relative of family F, versus B and F are unrelated.

Evidence, i.e. information or data such as DNA profiles, may be better explained by one of the hypotheses than by the other.

mixture
The word mixture is used carelessly and ambiguously in forensic DNA.
• Mixture is more usefully defined as “a combination of one or morecontributors, than as “two or more”.
• Mixture might refer to either of
1. mixture substance — mixture of biological material, DNA
2. mixture data — lists and numbers (typically an annotated EPG) obtained from processing mixture substance.
Software analyzes mixture data.

Implication: Since the term “ground truth” of a mixture is a description of mixture substance, it has limited relevance to mixture data. In particular, analysis by Mixture Solution or other software that (for example) estimates number of contributors, is not right or wrong according to whether it agrees with ground truth. It’s a tortuous route from the moment DNA is deposited as mixture substance to the final version of mixture data. It’s not a paradox that the best possible analysis of mixture data, the most correct analysis, sometimes differs from the ground truth about something else.

paternity index
a quaint synonym for likelihood ratio used in the context of disputed paternity

probability
• JS Mill explained it well.
• vs "frequency"
Allele probability and allele frequency are two different things. Usually people say allele “frequency” when they mean allele probability. See Why the quotes on “frequency”.

• prior probability
A probability summarizing the value of the evidence prior to inclusion of (for our purposes) the DNA (i.e. scientific) evidence. The prior probability is therefore — at least in principle — a subjective assessment of anecdotal and other evidence that cannot be comfortably quantified. It’s worth distinguishing several particular situations:
1. criminal case
The prior probability is the evidential value of evidence (such as testimony, documents, demeanor) that the DNA analyst doesn’t even hear, and which in any case is the responsibility of the judge or jury to assess. Therefore it is clearly wrong for an expert witness to intrude on the prerogative of the court by making any prior probability assumption. I don’t see anything wrong though with advising the court about how mathematics works, such as by a picture or a chart of examples.
2. (civil) paternity case
In principle the above applies to any court action. But this essay tries to take a realistic view.
3. mass disaster identification

When the object of the identification is humanitarian it is typically virtually the case that decision making is deferred to the scientists. In that case, two useful concepts are:

1. baseline prior probability
If n people are equally missing and all bodies are indistinguishable from one another, then the prior probability for any particular identity is 1/n. This baseline prior can be a reasonable starting point in realistic scenarios as well.
2. requisite prior
Suppose a DNA likelihood ratio has been determined, and a posterior probability threshold for identification has been agreed. Then I define the requisite prior as that prior which is just sufficient for declaring identity.

Example: Suppose n=1000 missing, LR=80000 supporting corpse V to be missing person Jim Jones, and that the agreed policy is to declare identification when the probability is at least 99.9%. Using the baseline prior probability of 1/1000, the posterior probability is only 98.8% and the threshold is not achieved. A prior of about 1/81, 12-fold larger than the baseline prior, is the requisite prior to obtain 99.9%.

Maybe there is some quite useful non-DNA evidence supporting the ID, for example the body V shares a surgical scar approximately like Jim Jones is known to have had, and the stature is about right as well. It would be hard to estimate the exact evidential value of those coincidences, but we don’t have to. It is sufficient to judge that they are worth at least LR=12, the shortfall from the DNA evidence.

unrelated
Everyone is related, so why do we say for example “The suspect is either the father, or is unrelated to the child” to describe the alternative hypotheses in a paternity question? Two answers:
1. It’s hard to find an accurate wording that is simple;
2. The fiction of literally being unrelated is sometimes a premise of our computational model. (The paternity formula 1/2q rests on a premise of unrelatedness.)
All that said, when I or anyone says “unrelated” there is a good chance that what we really mean is “randomly selected, with no more and no less than a random chance to have any particular close or distant relationship”.