Tuesday, January 22, 2013

Doctors, Lawyers, and the Rev. Thomas Bayes

Consider the following scenario, which was given to students and staff at Harvard Medical School (reported in "Risk Assessment and Decision Analysis with Bayesian Networks," p. 27):
One person in 1,000 has a prevalence for a particular disease (e.g., breast cancer, HIV). There is a test for detecting this disease, and it is 100% accurate when a person has the disease and 95% accurate for people who don't (in other words, 5% of people who don't have the disease will be incorrectly diagnosed as having the disease -- known as a false-positive). If a randomly selected person tests positive, what is the probability that the person actually has the disease?
Almost half answered 95%, which is almost certainly the most intuitive answer. As you might have already guessed, however, 95% is wrong. In fact, it is horribly wrong. The correct answer is less than 2%. How so, you ask? Because the probability of testing positive and not having the disease is not the same as the probability of having the disease and testing positive.

This can be illustrated relatively simply. To keep things manageable, assume a population of 1,000 people who are given the test. Of these, only one will have the disease, and he or she will test positive. However, approximately 50 (999 * 5%) who don't have the disease will also test positive, which means that 51 out of 1,000 people will test positive for the disease, but only one of those 51 will actually have the disease. So what's the probability that someone who tests positive will actually have the disease? One out of 51 (i.e., 1/51), which equals 1.96%. A far cry from 95%.

Why might this be important? Well, it might help doctors better communicate with their patients. Indeed, something very similar happened to the statistician Leonard Mlodinow ("The Drunkard's Walk: How Randomness Rules Our Lives"). In 1989 he received a call from his doctor who told him that the chances were 999 out of 1,000 he'd be dead within a decade. Why? Because he tested positive for HIV, and at the time the HIV test only produced a false positive 0.1% (i.e., 1/1000) of the time. However, Mlodinow's doctor failed to take into account the underlying probability that only one in 10,000 heterosexual non-IV-abusing white males who got tested were infected with HIV. When this probability is taken into account (as we did above with the Harvard Medical School scenario), the probability that Mlodinow actually had HIV was one in 11 (1/11) or 9.1%, again a far cry from 99.9% that his doctor told him. It isn't hard to imagine that doctors all over the world make similar mistakes and cause their patients undo anxiety.

Doctors aren't the only ones who get probabilities wrong. So do attorneys (and consequently juries), which sometimes leading to convictions (or acquittals) that are wrong. Take, for instance, the following scenario (from Risk Assessment and Decision Analysis with Bayesian Networks," p. 28):
Imagine that a crime has been committed and that the criminal left a trace of blood at the scene. Assume that the blood type is such that only one in every 1,000 people have it, and that a suspect (Fred), who matches the blood type is put on trial. In court, the prosecutor argues that since the chances that an innocent person has the matching blood type is 1 in a 1,000 and Fred has the matching blood type, the chances that Fred is innocent is just 1 in 1,000.
Is the prosecutor right? No. Imagine that there are 10,000 people who could have committed the crime. Of these, one is guilty, but there are approximately 10 others (9,999 * 0.1% = 10) who could have committed the crime. That means the probability that Fred is guilty is one out of eleven (1/11) or 9.1%. Put differently, the probability that Fred is innocent is 90.9%, quite a bit more than the one in one thousand chance the prosecutor argued.

Reasoning such as this is known as the "prosecutor's fallacy" and unfortunately quite common, which means that some people are being imprisoned who shouldn't be and others are being set free who should be locked up. Mlodinow, for instance (p. 118-120) tells the story of the UK's Sally Clark, who had two children die of SIDS (sudden infant death syndrome). After the second one died, she was arrested and charged with smothering her children. An expert witness estimated the odds of having two children dying of SIDS were 73 million to one, and Sally Clark was convicted of murder. As it turned out, the expert was a little off. The real odds were 2.75 million to one, but the problem was that the odds of two children dying of SIDS weren't compared to the odds of two children being murdered by their mother. As it turned out, the British Statistical Society and a mathematician weighed in on the matter and demonstrated that two infants are 9 times more likely to die of SIDS than be murder victims. As a consequence, Sally Clark was eventually set free.

So what does the Rev. Thomas Bayes have to do with all this? Well, the good Reverend was not only a Presbyterian minister, but he was a mathematician interested in probability. More precisely, he was a mathematician interested in the probability of a particular event occurring given what we already know. Without going into great detail, the Rev. Bayes laid the groundwork for what today is known as conditional probability, which I've attempted to illustrate with the examples above. In short, what Bayes was able to show is that we can't know the probability that a positive test for a disease is correct without knowing the underlying probability of someone having the disease. And we can't know if someone who matches the description of someone observed at a crime scene is guilty without knowing the degree to which someone matching their description is present in the population.

Bayes' article on the topic was not published in his lifetime. He left it (and other papers) to another Presbyterian minister, Ronald Price, who was a friend of Benjamin Franklin, Thomas Jefferson, and John Adams. Price edited Bayes' paper and had it published in the proceedings of the British Royal Society (of which Bayes had been a member). Bayes' rule (as it has come to be known) was discovered independently by the Frenchman Pierre Simon Laplace, and it has since been developed and applied in a number of areas. As one author has noted, Bayes' rule helped crack the enigma code during WWII, locate missing subs, show that smoking causes lung cancer, and so on ("The Theory That Would Not Die"). It all seems so simple, but it took a Presbyterian minister to help sort it all out.

No comments:

Post a Comment