WTC DNA identification prospectus
Analysis of screening; Powerpoint presentation
Tsunami victim identification considerations
Forensic mathematics home page
|Hals' Willem Coymans|
Back at the hotel I had switched on the TV in time to see some of the live action and numerous replays of now-familiar events, accompanied by surprisingly little voice-over aside from a short period during which a smug BBC commentator analyzed the context and future significance of the goings-on until mercifully he was given the hook.
Wednesday morning we were slightly affected by the attack in the sense that we waded through security confusion when we flew out of Schiphol (Amsterdam) for London. Moreover, in considering the talk that I was scheduled to present in London, I hit on the idea to discuss the prospects for WTC identifications.* But not until I read my email message from Dr. Howard Baum of the Forensic Biology (i.e. DNA) section of the Office of the Chief Medical Examiner of New York did the possibility of my personal involvement with the disaster occur to me. The OCME has been using the Kinship module of DNAVIEW for several years, and it is already well established as a tool for disaster body identification. I shouldn't have needed Howard to remind me. His message was brief. He wrote, "We need help coping with the mass disaster in New York City."
Howard's immediate thought was the application of the Kinship program, but I thought I had even more to offer. There was going to be a lot of genetic data to manipulate, whose exact nature couldn't be predicted in advance. As one who has worked with computers since 1959, earned a doctorate in mathematics, and done dozens of practical or research projects involving DNA-relationship ideas and computations, I figure I am uniquely prepared to perform whatever manipulations and analysis might be necessary to wring information from the data.
Swissair identification paradigmFrom earlier experience in disaster identification with the Swissair 111 crash, I assumed that there would be a necessary "screening" step in making the WTC identifications based on relatives; further, I could extrapolate that due to the larger scale of the WTC problem new complexities would need to be faced (links WTC prospectus and WTC Powerpoint above).
Some of the victims of the Swissair crash needed to be identified indirectly, by comparison with living (or dead) relatives. A two step paradigm emerged:
The main difficulty that I foresaw emerging as the sizes of the two lists the victim list and the family reference list grow, is the increasing incidence of "false positives." If both lists are small and some person C in the reference list has a brother who died, and some profile V in the victim list looks like a brother of C, it probably is. However, if the victim list has thousands of profiles, then for any given reference person C there will be dozens of victim profiles that coincidentally resemble C just as much as a typical true brother does. The proportion of false positives is proportional to the size of the victim list.
Therefore I was sure that a simple sorting program of the sort that had been adequate for the Swissair identifications, would not be very useful for the WTC disaster. Specifically, in my London talk on September 13, I hypothesized
Of course the first several of my estimates have proven to be quite far off.
However, #4-8 are in the ballpark. The implication of the estimate #7 is, that for
every 1000 victims, there will be about one who coincidentally resembles any
given reference person to the same extent as does a true child. Thus, using
individual parents as references to fish victims out of the rubble would result
in more false leads than true ones. On the other hand, #5 implies that if a more
sophisticated trolling operation is used, wherein two reference parents are
simulataneously compared with each victim to accomplish a sort of
The upshot was, on October 2, the first of several trips to NY. Dr. Shaler organized a one-day meeting, a "summit of genetics experts" (Wall Street Journal) to discuss various problems and possible approaches for sorting through the inevitable masses of data. Five laboratories the city Office of the Chief Medical Examiner, NY State Police, Bode Technologies, Myriad, and Celera who were expected to do parts of the DNA analysis were represented. Coincidentally Myriad now included my old colleague Benoit formerly of the RCMP. The FBI and myself were also present to discuss software, as well as Howard Cash and others from a company called GeneCodes that is contracted with the OCME to provide software. Finally, there were a few people from the NIJ (National Institutes of Justice, which is another arm of the DOJ). Following introductory explanations by Bob Shaler in the morning several of us presented our ideas about making the necessary victim-to-reference identifications. The afternoon was mostly discussions, and of course mostly rather general and oriented toward planning. From time to time, though, people inevitably succumb to the temptation to discuss details even when it is obvious that there is neither time nor yet sufficient information to make a detailed discussion productive. When this happened, Chief Inspector Dale of the State Police patiently suggested, each time as if it were the first, that it would be appropriate to make general plans. I enjoyed that.
At some juncture, concerned that the plans might be steering toward an unnecessary and ponderous software project, I made a comment to the same effect as I have indicated above, that once I am able to get my hands on the data, I will quite quickly be able to produce the tentative identifications by myself. At this Howard Cash piped in, "Surely, Charles, even your work can stand a second opinion." I told him he had a fair point.
The three-day meeting ranged over a variety of topics. The one topic originally mentioned to me was the same that Bob Shaler had already asked of me: choose which screening program to use. To that end I put together a Powerpoint presentation to explain the difficulties and pitfalls as I foresaw and, by now, had computed.
In assessing the candidate screening programs, I had in mind several design requirements:
November 8, 2001 I went to the OCME lab in NYC for a scheduled two-day visit to deliver Mark I of my new software, collect some data, and try it out. Predictably, the first look at the data showed some small surprises: the format of the data was not exactly what I expected. I decided to stay around a few extra days to sort everything out. Thus, I was in New York for the astonishing incident of the following Monday.
November 10, 2001 The first effective run of the screening program poured out a sorted list of victim-family potential associations in about 5 minutes. The families on the top of the list figured to be genuine identifications; families with no strong resemblance to any victim sample would sort further down. Indeed, the first victim-family pair on the list had an extremely high score. Checking the actual profiles showed that this case concerned identical twins, as I expected since I'd been previously told that among the DNA identifications already made through "direct" references was a person who was the identical twin of a survivor.
Next on the list was another identical twin case. This one or at any rate at least one of the two was a new identification, the first I had found. The third candidate identifications on the list was a case where the mother, the daughter, and a brother had presented themselves as references. The screening report only told me that two of these people bore a resemblance to the same victim. To confirm the identity of that victim, I needed to make a family-specific computation with the Kinship program to check that the entire assemblage is genetically consistent and numerically convincing.
It was. According to the kinship computation, it was either the right victim, or it was a one-in-twenty-billion coincidence. That's good enough to call it a confirmed identification.
And so it went, easily, for the first thirty or so cases, that Saturday morning. Of the next twenty, some looked ok but the data was often insufficient to be confident according to the criteria we had established. A few looked to be spurious. After that, the list started to run dry. The data simply didn't exist to say who went with whom. More data was necessary, both as to obtaining more reference family members, and recovering further victims. Nor was I in a position to know which of the identities I had found might already be known from direct DNA references or from some other method.
We discussed the codings schemes that would be used for the airplane victims, and I considered what minor modifications would be needed. Plane crashes, unlike office environments, tend to include related people. It's important to know about them and to consider them in the analysis. Shaler and his colleagues were already thinking about how to improve on the WTC experience in collecting samples from relatives. It may seem a ghoulish observation, but an experienced disaster identification team was swinging into action.
I believe that every disaster is unique. Contrary to the hope of a few of the KADAP group, I don't believe it is practical or realistic to expect a "disaster identification" program to be a result of the current identification effort, or efforts. Useful tools, yes. Worthwhile experience, also.
Most of the AA587 victims were identified within a few weeks, which is sensational. Announcement of the final identification took about three months, as inevitably a few cases were delayed by special problems.
No new victims will be found. It will take a few weeks for the DNA profiles of the most recently excavated victim pieces to be reported. Once those have been checked against the reference materials direct or indirect already in hand, I expect a pause in the identifications. Further progress will depend mainly on success in the new DNA techologies that are being attempted, namely SNP's and mtDNA.
Last July Bruce Weir suggested that I submit an article on World Trade identifications to an edition of Theoretical Population Genetics as the name implies a rather high-brow scientific journal that he was editing. He thought "it would have high interest," and so it proved. The article, completed with Bruce's considerable help and co-authorship, attracted what is by my standards a lot of interest. I gave an interview to a German radio (in English) and a newspaper, to the children's science magazine Odyssey, and to Nature.com.
One point I made in the article was an estimate of the maximum number of bodies for which any DNA has been found. I did this probabalistically, trying to account for those victim fragments that produced any DNA at all. If we then make the optimistic assumption that all such fragments can eventually, through ultimately sophisticated DNA typing methods, be identified, as many as 2100 victims might eventually be identified. Perhaps a better way to put it is that no more than 2100 will be found. Dr. Hirsch, the NY Chief Medical Examiner, has expressed his commitment to continue with the work possibly for a long time, with a goal in mind of 2000 identifications. I agree that even that number seems very difficult.
Orchid Biosciences has developed a high-speed, high-throughput, largely automated method for SNP typing, which is being used for WTC samples.
For SNP's, the region of interest is a single nucleotide, which considerably reduces the fragment size. Moreover, the method by which multiple assays are acheived is entirely different, so there is no need for a spacer in the flanking region. Consequently SNP's may succeed where STR's fail when the DNA has degraded to the point where the typical fragment size is around 100 bp.
Over 5000 samples, some victims, some from living relatives, have been typed. The attempted panel of typing is 70 loci. Given that many of the samples are degraded, as we already know from the mixed success of STR typing, there is mixed success in SNP typing. Often only a partial profile is obtained. However, for sure there are times that SNP is quite successful even though STR was not. For these cases the SNP technology is quite likely to provide new identifications. Additionally, there will be cases where both technologies are only moderately successful but the combination is just good enough.