Kinship Analysis by DNA When There Are Many Possibilities

CH Brenner, Consulting in forensic mathematics, Berkeley

Abstract

Many of the bodies from the Swissair flight 111 disaster could be connected only indirectly – i.e. through relatives also on the flight – with reference samples of known identity. In some cases extended families perished, so that even when the probable members of the family have been identified as a group, there remained many permutations of individual identities to consider. Likelihood ratios provide a standard and general method for comparing two possibilities, and in principle one can decide among many possibilities by comparing every possible pair. However, such a brute force technique is cumbersome.

Eventually I hit upon a useful heuristic for comparing among multiple possibilities. It consists in arranging the possibilities in a diagram mathematically called a lattice, after which a small amount of work usually eliminates with near certainty all incorrect assignments of identity to body.

Applications where the method discussed may be useful include mass disasters, multiple graves (as in recent Balkan wars), and some complicated kinship, immigration, or inheritance problems.

Discussion

1. Likelihood ratios

In using genetic typing results to decide between two possible ways that a set of people may be related, using a likelihood ratio is natural. Suppose, for example, Mother=PS, Child=PQ, Man=RQ are the genotypes, and suppose that the man either is the father or is unrelated to the child. Then X=(2ps)(2qs)(1/4) is the probability of observing such types if the man is the father and Y=(2ps)(2qs)(q/2) is the probability of observing such types if the man is unrelated. The ratio, PI=X/Y=1/(2q) is the likelihood ratio favoring paternity.

2. Likelihoods

assumption	father	uncle	unrelated
relative likelihood of evidence	X/Y	(X/Y + 1)/2	1

Suppose, however, that there are more than two scenarios to consider. Besides "father" or "unrelated," the man might also be an uncle. Then besides X and Y, we have Z=Pr(evidence|unclehood)=(X+Y)/2. So now there are several likelihood ratios to consider: X/Y, X/Z, Z/Y. In general there are (n² - n)/2 ratios when there are n scenarios. In such cases it is better simply to think in terms of the likelihoods separately, for there are only n of them. The best way is probably to divide each likelihood by the smallest one. Whether one divides or not, each likelihood of course represents the relative strength of the evidence for the corresponding possibility. In the avuncular case, we can make a diagram, as shown on the right.

This is quite a feasible approach when there is a handful of possibilities – so long as the number of possibilities is small enough that one is willing to make a separate calculation for each possibility.

3. Myriad possibilities

But for a really complicated problem, some further simplification and systematization is desirable. While trying to confirm body part identifications from the September 1998 Swissair crash near Halifax, Nova Scotia, I developed an approach that I call the lattice method.

The method is illustrated by considering one of the complicated family identification problems that arose. Five members of the X__ family perished in the crash. The child Albon was not on the plane and was the one living reference. Among the DNA profiles from body parts recovered at the crash site, there were five that appeared to form a cluster of relationships including Albon. Further, based on the particular patterns, including amelogenin types, a tentative assignment was made of body parts/profiles to names. In the figure, the letter E represents Albon, and the other letters represent DNA profiles that are tentatively ascribed to people, as suggested by the position that the letter occupies in the family tree.

The favored set of tentative identifications, abbreviated GF_DCM, is the most likely possibility but not the only one. We set as a goal a likelihood ratio of at least 10⁶, when the best explanation is compared with the second best.

Many potential alternative explanations are conceivable. At a minimum, those combinations like ?F_DCM, meaning G is not Sylvie but is instead another, unrelated person, are consistent with the DNA evidence. The number of such combinations, obtained by omitting one or more of the letters G, F, D, C, or M from the diagram, is 32. Besides that, it might be possible to exchange some pairs of letters, or to shuffle them around. There are hundreds of combinations to consider.

Therefore, the linear method used above for father-uncle-unrelated is not attractive.

3.1 The lattice of possibilities

An alternative method is to arrange the various possibilities of body-person correspondences into a hierarchy. The diagram puts GF_DCM at the top level. On each lower level, are those possibilities obtained by replacing a letter with a ?, as shown by arrows. The diagram is a lattice, or partial ordering among explanations, where the criterion for ordering is whether more or less of the genetic data is explained.

3.2. A heuristic assumption

The diagram is useful because it is a lattice in another sense as well: any explanation in the diagram is better than those that lie below it along a path of arrows. This claim might be false for a single locus, but there is really no chance that it will fail across a combination of 10 or 14 loci (given that the top explanation in the diagram is consistent). Therefore we can be confident, although not mathematically assured, that the likelihood ratio between the explanations at the tail and head respectively of every arrow is always >1.

3.3. Computing likelihood ratios

For most of the arrows in the diagram, this is enough; no further computation is necessary. However, for each arrow efferent from the favorite explanation, an explicit likelihood ratio computation is necessary. I previously devised a computer program that makes kinship computations. Each arrow takes a minute or so to compute.

In the example shown for the X__ family, one of the top-level likelihood ratios is only 300. In practice we could improve this number by taking into account the "closed system" nature of the crash – if G is not Sylvie, who else could G be? (Additionally: If Sylvie is not G, where is she?) However, considering the X__ family in isolation ?F_CDM is mildly plausible and the 10⁶ goal would be missed.

Further likelihood ratio calculations were then made along the arrows leading down from ?F_CDM, in order to assure that there were no other competitor explanations, but there is no need (in this case) to calculate arrows further down the lattice than two levels as shown in the diagram. To see why, consider the likelihood ratio comparing GF_DCM at the top, and ??_D?? four levels down. It is obtained by multiplying together the labels (representing likelihood ratios) of each arrow along the path. That product is 3·10⁹ – already >10⁶ – after two terms, and since by the heuristic assumption the remaining terms must be >1, the superiority to10⁶ is assured.

3.4. Permutations and mutations

A few permutations of the X__ family member candidates are consistent modulo one mutation. These combinations also fit into the lattice, but they are not derived from the favored explanation GF_DCM merely by omitting some subset of the people. As an example, consider the hypothesis G?M?D, which comes from GF_DCM by omitting a couple of people, and moreover trading the places of the mother and daughter M and D. By chance, it is within one mutation of being genetically consistent.

In graphical terms, "trading places" explanations are those that can't be reached from the top of the lattice by following arrows downward. They are connected to the lattice though, because for example G?M?D and GF_DCM have a common descendant G?_???. The lattice structure is helpful for proving that no "trading places" explanation is a good explanation. However, I shall not include examples, and I have omitted them from the example lattice of identifications diagram to avoid clutter.

Acknowledgments

Swissair identification team members Ron Fourney, George Carmody, Benoit Leclair, and Chantal Frégeau were particularly helpful during the development of these ideas.

Contents