Realistic matching odds

An interesting article by Balding led me to the following line of thought.

Suspect related to donor

In a typical criminal DNA case, the matching odds calculation presented by the laboratory to the court assumes that the suspect is either the donor or is unrelated to the donor. Consequently the odds against a match at random are astronomical, such as 1014 for a typical CODIS (i.e. United States) STR profile.

If instead the suspect can make a case that the donor might be his brother, the matching odds drop considerably – maybe 106. The most authoratitive recommendation in the US on this point states:

But what are "possible contributors"?

"Possible contributors"

Since the excerpt states that possible contributors should be tested, apparently they are considered to be rather definite people. But that's puts the suspect in an unfair either/or situation:
  1. Either there is no chance at all that the suspect is related to the donor (unless by identity), or
  2. there is a very good chance.
  3. Nothing in between, such as a slight chance.
But common sense says there is always a slight chance. Maybe the suspect doesn't even know that he has a brother. Nonetheless, from the court's point of view he might. And even a slight chance changes the matching odds radically.

Intermediate view

The point is that the fair computation of matching odds would be some sort of weighted odds between the "unrelated random man" computation (e.g. 1014) and the "brother or other close relative" computation (formulas 4.8 and 4.9, e.g. 105). Even a very slight weighting on the second possibility gives a result several orders of magnitude smaller than the "unrelated" computation.

To reiterate – the "unrelated" computation is based on idealized, i.e. unrealistic, assumptions, such as the assumption that the suspect is either the donor or is completely unrelated to the donor. At the other unrealistic extreme, we imagine that the suspect is either the donor or the brother of the donor. A reasonable intermediate view, in my opinion, is that assuming the suspect is not the donor then he is probably nearly unrelated to the donor but there is some small chance that the donor is his close relative, so a more relevant matching odds calculations is some weighted average between the two numbers I mentioned (ok, and also of other intermediate numbers representing other relationships, more distant than brother).

Weighted average

Here's an example.
Degree of relationship Prob thereof Prob relative matches Chance of such match
sibling 1/4096 1/150000 1/630e6
parent/child 1/4096 1/160e6 1/640e9
uncle/nephew/halfsib 1/2048 1/19e9 1/40e12
1st cousin 1/1024 1/620e9 1/630e12
1/32 allele sharing IBD 1/512 1/5.9e12 1/3e15
1/64 (=second cousin) 1/256 1/23e12 1/5.8e15
1/128 1/128 1/48e12 1/6.1e15
1/256 (=third cousin) 1/64 1/71e12 1/4.6e15
1/512 1/32 1/88e12 1/2.8e15
1/1024 (=fourth cousin) 1/16 1/97e12 1/1.6e15
1/2048 1/8 1/100e12 1/820e12
1/4096 (=fifth cousin) 1/4 1/110e12 1/420e12
unrelated 1/2 1/110e12 1/220e12
Cumulative weighted matching chance= 1/630e6
Matching odds calculated conventionally are 110e12 (1.1 • 1014). Maybe 630 million is a more meaningful number. The estimate 1/4096 as the relative chance of a brother being the culprit (assuming the suspect is not), is a completely arbitrary number. Notice, though, that the weighted average is 100% dominated by the "brother" term. The "random man" matching number is completely irrelevant. The final result depends on only:
  1. the matching odds for siblings
  2. the conditional probability a sibling is the donor, given that the suspect is not.

The DNA•VIEWTM profile computation tool DNA odds offers the above computation as an option ("Uniqueness estimate").


Go to the top of this page