damnum absque injuria

May 28, 2008

DNA and Guilt Redux

Filed under:   by Xrlq @ 6:24 pm

The following is the letter I intend to send to the L.A. Times regarding their piece earlier this month on the Puckett case. Any comments/critiques are welcome.

Dear Mr. Felch and Ms. Dolan:

I read with interest your May 3, 2008 article on crime-solving and DNA evidence, particularly in old cases where only partial matches are possible. In the case of murder trial of John Puckett, there was significant discussion of the debate over whether the odds of Puckett’s innocence were 1 in 1.1 million, as the prosecutor maintained, or 1 in 3, as the defense did. In fact, both numbers are almost certainly wrong.

Oddly enough, the reason why both numbers are wrong was made clear in a Times article that ran only three days later, which was co-authored by one of you. In that May 6 article, you rightly noted the problems inherent in the “prosecutor’s fallacy,” where the odds of an event occurring prospectively are mistaken for the odds that it has occurred, after the fact. To give an obvious illustration, the odds of anyone being struck by lightning in his lifetime are 3,000 to 1, and the odds of it happening on any given day are much more remote than that. However, once you encounter a corpse on an open field shortly following an electrical storm, the odds that he was struck by lightning grow exponentially. The reason is akin to the common joke “the odds of X are slim to none, and Slim just left town.” Before the incident, the odds that this unfortunate soul would be zapped were roughly 3,000 to 1, but after the fact, the vast majority of those other 3,000 would have resulted in him not ending up dead on the field, and can therefore be said to have “left town.” At this point, the only odds left to consider are those relating to alternative scenarios that also would have left the guy dead in the field.

Similarly, we know going into a cold database search like Puckett’s that there is a 1 in 3 chance that someone’s profile will randomly match to the killer’s (or conversely, a 2 in 3 chance that no one will), and a 1 in n chance that the killer himself will be in the profile, generating a not so random match of his own (and conversely, an (n – 1) in n chance the killer will not be in the database). If the search generates exactly one match, as it did in Puckett’s case, we can say with certainty that there was either a random match or a true one, but not both. These two possibilities can be described as follows, with G (guilt) indicating a true match and I (innocence) indicating a random one:

P(G) = 2/3 x 1/n
P(I) = 1/3 x (n – 1)/n

Which is more likely? That depends on the value of n. If the value happens to be 2, i.e., there was a 50-50 chance of Diana Sylvester’s killer being in the database, then the probabilities are as follows:

P(G) = 2/3 x 1/2 = 2/6
P(I) = 1/3 x (2 – 1)/2 = 1/6

The probabilities don’t add up to 1 but that’s OK; the other 3/6 represents the scenarios that “left town,” i.e., would have resulted in either no hits or more than 1. After confining ourselves to the 3 in 6 scenarios consistent with the single match we got, we are left with essentially 2 to 1 odds (or 1 in 3 chances) against Puckett having been an innocent who was randomly chosen. If the odds of the killer being in the database are higher than that, then we can be more confident of Puckett’s guilt, but not necessarily to an extent that would satisfy reasonable doubt. But if the original odds of the killer being in the database were 1 in 3, the same as the original odds of a random match, then innocence and guilt are equally probable. And if the odds are lower than 1 in 3 – not implausible given that they weren’t maintaining a sex criminal database at all in 1972 – then it is actually more likely that Puckett was a false match than a true one.

The bottom line is this: until we can find a reliable method of computing the likelihood that a given killer will or will not be in the database, it is impossible to tell what the odds are that a given match was obtained randomly or not. This may not matter in cases of full DNA matches, where the prosecutor’s fallacy makes the odds of an innocent match seem like a trillion to one when in fact, they are “only” a billion to one, or worst, “only” a million if the odds of the killer being in the database were unusually low. But it makes a huge difference in cases like Puckett’s, where the odds against a false match were only 2 to 1, and we have no idea at all what the odds of a true match were.

UPDATE: Before I had the opportunity to send the message, I got cc’ed in a message from Jason Felch that was the subject of Patterico’s post today. I ended up sending this message instead:

Mr. Felch:

I am the XRLQ cited the in the thread below, and was planning on sending you an email of my own but will respond here instead. While I have strongly disagreed with several of Patterico’s conclusions on the Puckett case, he’s absolutely correct on his point about the prosecutor’s fallacy. Whether we’re talking about the 1 in 1.1 million figure in the case of individuals or the 1 in 3 figure as to the database as a whole, neither figure tells us anything about the odds that a true or random match occurred in the past. Suppose that you and I were to walk in an open field the day after an electrical storm, and we encountered a corpse showing evidence consistent with electrocution. You suggest he was probably struck by lightning, to which I respond “Possibly, but the odds are against it. The odds of anyone getting struck by lightning even once in his lifetime is only 1 in 3,000, and the odds of getting struck on any particular day are even more remote than that.” Your response probably would (and certainly should) be to say “Forget the 1 in 3,000 figure. Most of the other 2,999 represent scenarios that would not have left the guy dead on the field.” It’s basically a new twist on the old joke about the odds of anything being “slim to none, and Slim just left town,” only in this case, it was everyone but Slim who left town, and an event that was once exceedingly improbable is now almost certain to have occurred. Once we sift through the data and exclude all the scenarios inconsistent with it, the odds of a lighting strike are still 1 in something, but that something has no meaningful relationship to the orignal 3,000. The only thing it may be good for is to compare the original odds of this unfortunate individual being struck by lightning to the original odds of any other event that also would have left him dead on the field. If we find exactly one alternative to lightning that was equally consistent with the evidence we found, and that scenario also originally faced 1 in 3,000 odds, the appropriate conclusion is that one of two highly improbable events absolutely did occur, both were equally probable, therefore, the odds that either of them did in fact occur are now 50-50.

That’s basically the situation we have here. The entire database search yielded exactly one match, which was either a “true” match derived from the killer himself, or a “false” one derived from someone unrelated to him, who randomly shares the 5 1/2 indicators tested. Nothing about the match itself screams out “I’m a true match” or “I’m a false/random one.” All we have is a match, which on its face is equally consistent with that match having been either “true” or “false.” We know that a “false” match originally stood 1 in 3 odds of occurring, while a “true” match stood in 1 in x odds, with x indicating the likelihood that the killer was in the database to begin with. Which is more probable to have occurred, the 1 in 3 event, or the 1 in x event?

Answer: without knowing the value of x, it’s impossible to tell. Without the ability to assess the likelihood of a “true” match, there is no way to determine whether any given match is more likely to be true vs. false. If the value of x happens to be 3, making true and false matches equally likely to occur, then we could say after the fact the likelihood of Puckett’s guilt is only 1 in 2, or 50-50. If the odds of the killer being in the database were lower than 1 in 3, then it more probable that Puckett was a false match rather than a true one. Only if the odds of the killer being in the database happened to be 50-50 would it make sense to cite either the 1 in 1.1 million or the 1 in 3 figure as the likelihood of anything, after the fact.

In sum, when Puckett’s attorney argued that the odds of Puckett having been randomly matched were “only” 1 in 3, she was actually being far too generous to the prosecution. She should have said that without knowing the odds of the killer being in the database, there is no way for the jury to assess the likelihood that Puckett was matched randomly, and therefore, the DNA evidence should have not have been admitted at trial.

7 Responses to “DNA and Guilt Redux”

  1. Phelps Says:

    Too long to be published as written. Also, it includes math, which is too complicated for a journalist to understand, so they won’t bother to read it.

  2. nk Says:

    I think you were spot on with your lightning example. Otherwise, we have disagreed on this before, at Patterico’s. Daubert rule notwithstanding, this should not be admitted as evidence at trial. Possibly for probable cause in a request for a warrant. An investigative tool only.

  3. Xrlq Says:

    Not sure what our point of disagreement is. Since we can’t tell how likely the killer was to be in the database, we have no way of knowing whether a “true” match to the killer himself is any more likely to occur than a “false” one to an unrelated individual who randomly matches 5 1/2 of his DNA indicators, let alone by how much.

  4. nk Says:

    I misunderstood you, then. My argument at Patterico’s was that the potential for prejudice outweighed the probative value at trial. I thought you thought that it should be admitted.

  5. Xrlq Says:

    Initially, I did. Once I understood the prosecutor’s fallacy problem, I concluded that a partial DNA match has no (measurable) probative value, and should therefore be excluded. The only exception would be if someone could find a credible way of assessing the odds that the killer is in the database.

  6. James B. Shearer Says:

    3 5

    Not sure why you think the test should not be admissable. Basically a positive result for the test greatly increases the chance that the suspect is guilty. If there was initially a reasonable chance (say 1% or more) that the suspect was guilty then a positive result of a test with a one in a million false positive rate makes guilt almost certain. If the initial chance was very low (say one in a million) then a positive result raises the chance to the 50-50 area. Either way you need some initial estimate of the suspects guilt to interpret the test.

    As for estimating the odds that the killer is in the database this can be done in the same way you can estimate the chances that a dead wife was murdered by her husband by looking at previous similar cases. This is not only cold case run against a database, you can use the other results to estimate the chance of a true hit. Note most such searches are run with more markers so false hits are less likely.

  7. Xrlq Says:

    James, while a positive result certainly does increase the chance that the suspect is guilty, the question is by how much, and whether a jury can be expected to assess it reasonably. Your reference to a “one in a million false positive rate,” when the actual odds of a false match in this instance were 1 in 3, further illustrates my point. Both numbers, however, are tainted by the prosecutor’s fallacy, which is why I’m arguing that even the 1 in 3 figure probably shouldn’t be admitted, as it too implies a level of ex post facto certainty that simply isn’t there.

    As to your suggestion for estimating the odds that the killer is in the database, I’m not sure how that is supposed to work unless you had a larger sample of killers from that time period. I do know that for a 1972 killer to have made it into a database that didn’t even exist at the time. Robert Baker never made it into that database; who else didn’t, either? If I had to bet on the odds of the original killer being in that database, I’d put those odds just south of 1 in 3, making a false match more likely than a true one.

    Note most such searches are run with more markers so false hits are less likely.

    True, but even then we still have to have at least a ballpark figure on the likelihood of the killer being in the database. Without having a decent idea of how likely both a true and a false match were, there is no way to assess which outcome is more likely to have occurred than the other, or by how much. It may not be an issue in the case of full match, assuming there wasn’t anything unusually odd about the database making it nearly impossible for the killer to be in there. But when you start with odds that look like 1 in 1.1 million and are really only 1 in 3, it doesn’t take a lot to overcome that.

Leave a Reply

Subscribe without commenting

 

Powered by WordPress. Stock photography by Matthew J. Stinson. Design by OFJ.