Table of contentsMathematics takes a holiday
Mathematics takes a holiday — Y (Powerpoint from February 2018 AAFS meeting, Seattle)
Forensic mathematics home page
Comments are welcome (see home page for email)
Since SWGDAM's recommendations carry a lot of weight in the United States, the formula has unfortunately been assumed by some casework DNA analysts to solve a problem that it does not solve, leading to confusion in court and sometimes wrong rulings.
In a longer talk I would explain why it's more than useless — it's wrong.
The formula in question is the so-called "theta" formula from "SWGDAM Y-STR Interpretation Guidelines – Approved 01/09/14". The main culprit is section 10.3:
10.3 It is recognized that population substructure exists1 for Y-STR haplotypes. Studies with current population databases have shown that multi-locus θ values are very small for most populations, with the magnitude of the value being inversely proportional to the number of Y-STR loci. Theta (θ) is used in the following equation for the match probability:That's nonsense as population genetics and it's bad writing as well. Notably there is no guidance as to when the formula should be used. I'm aware of several cases recently involving sexual assault on a reservation, that is in an area where the population is mainly people from a particular Native American tribe. There are no population samples for individual tribes though, only some pooled data across Native Americans generally. I cannot much blame a few lab analysts for imagining that the formula allows them to use the pooled data in a tribe-specific context; of course they imagine the formula must be good for something, and of not that, what? But a simple argument, explained in my 12 minute talk, explains how it is impossible to extrapolate from the pooled data anything at all about tribal Y haplotype matching probability.
Eq. 3 Pr(A|A) = θ + (1 − θ) pA
where A is the haplotype of interest and Pr(A|A) is the probability of observing haplotype A given that it has already been seen once in another individual of the same subpopulation. pA is the profile probability which can be estimated by the counting method, with sampling uncertainty being accommodated by using the upper confidence limit for the estimate of pA.
Prof. Bruce Weir commenting in the question time after my talk seemed skeptical. However, we sorted out the disagreement later. Apparently he doubted me because he was incredulous that SWGDAM was really doing what they in fact do: They calculate something they call theta without any tribe-specific data. But by the next week he wrote
I agree that the SWGDAM theta values do not apply to populations within any of the five major ethnic groups listed in the SWGDAM Guidelines.Thus we have agreement on one critical point: There is no justification from SWGDAM or otherwise for Y haplotype calculations, in a tribal context, such as the lab analysts did.
The story of how Eq 3 came to be promoted by SWGDAM and how it survived until now is a shameful story. It originates as guesswork and survives through lack of curiosity so concerted as to amount to stonewalling.
The guesswork consists of noting a similar formula for autosomal loci published by Balding and Nichols, and by bad analogy and a leap of faith (no one ever thought to check whether Balding and Nichols' derivation of the formula could be repurposed for the Y domain) supposing that about the same formula should apply to Y.
For now I'll give one example of stonewalling since it's mentioned in the slides of the talk. Before my February 2018 talk I wrote an email to the SWGDAM chairman. As far as I'm aware this is the only interaction we've ever had.
Good morning Mr. Onorato,A few days later:
I'm preparing to give a talk on SWGDAM's mathematical suggestions for Y haplotype evidence evaluation at the forthcoming AAFS meeting, about 5 weeks from now. I'm especially interested in the theta ideas in section 10.3, but haven't been able to track down clear sources for the information and claims. To be fair with my criticism I'm writing to you as representing the committees to give the committee an opportunity to justify the published recommendations.
Can you cite for me information about theta as used in the SWGDAM recommendation? I see discussion that it's analogous to autosomal ideas, but that's vague without an explicit math paper. In particular, what is the justification or provenance of Eq 3?
Who is responsible for the writing in the section?
How exactly is theta calculated? There is a table of theta values in the recommendation, but I was not able to trace back through the listed references to formulas or a computer program from which they come.
To give you some context, I have spoken several times with Bruce Weir on this subject. I look forward to any answers, or discussion, you or your colleagues can provide.
Charles Brenner, PhD
Consulting in Forensic Mathematics
Dr. Brenner.For the record, the totality of my contacts on the matter with committee members was (a) a very brief call with Steve Myers in about 2014. I asked if he could talk about the formula and Steve pointed me to Bruce Weir; (b) Bruce and I exchanged a few emails and talked in person a few times. Tony knew that only from my email; he didn't check with Bruce as well.
Thank you for giving SWGDAM the opportunity to respond to your questions. In accordance with our normal business practices, I have reviewed your inquiry with the leadership of SWGDAM and discussed the specifics with the current lineage marker committee. In reviewing the information previously provided to you by members of the YSTR committee (both telephonically and through e-mail exchanges dating back to 2014), I believe you have all of information we can provide to aid you in making a fair and informed evaluation that will advance the proper application of this important DNA marker system.
Best regards. Tony.
So no information at all is "all the information."
|note 1||The passive voice — "It is recognized" — hides
responsibility. Who is it that recognizes population substructure for Y
haplotypes? No one who thinks does, at least not in anything like the sense for which it is being invoked
here. The implicit point here is that Eq 3 applies somehow to Y haplotypes by
analogy with a similar autosomal formula that relates to
population substructure for autosomal loci.
But the sense in which population substructure leads to such a formula for an autosomal locus is substructure induced by preferential mating. Y haplotypes don't mate, preferentially or otherwise. They clone, like bacteria. Bacteria don't prefer.