Follow by Email

Saturday, April 4, 2015

Correlation is Not (Necessarily) Causation

We are a cause-seeking people. When events occur simultaneously or in succession to one another it is natural for us to connect them, to see a cause and effect relationship. "As the eighteenth-century Scottish philosopher David Hume famously noted, we find patterns in nature and look for the 'hidden springs and principles' that bring those patterns to life and let us navigate them. We can't help doing this. Nature made us so." (Scott Atran, Talking to the Enemy: Faith, Brotherhood, and the (Un)Making of Terrorists)

Seeing patterns and linkages between events has served us well. We've benefited from realizing that rain helps crops grow, falling trees can be dangerous, and charging Rhinos can kill. However, sometimes we see patterns when there are none. It is helpful to remember that just because two things are correlated with one another doesn't mean that a cause and effect relationship exists. Consider, for instance, the following:
  • Ice cream sales and crime are positively associated with one another. When one increases, so does the other. Does this mean one causes the other? No. Instead, both are positively correlated with the weather. As temperatures rise, so does crime and ice cream sales.
  • Most car accidents occur within 25 miles of where the victims live. Does this mean that people drive better when they're on vacation? No. What it means is that we do most of our driving within 25 miles of where we live.
  • People who move to Florida are more likely to die than people who don't. Does this mean Florida is a more dangerous place to live than the rest of the United States. Probably not. But it does attract a disproportionate share of older Americans, who, of course, are more likely to die than younger Americans.
These are what we call "spurious" correlations, and they are all around us. Unfortunately, we are not always adept at separating the wheat (genuine correlations) from the chaff (spurious correlations). Take, for instance, the following graphs (from Steven Pinker, "The Better Angels of our Nature"). Which one do you think was generated randomly? People often pick the one on the right, but it's actually the one on the left. What typically throws folks off is the clustering in the random graph, but true randomness produces clustering.


Now consider the following plot of where V1 and V2 bombs hit during the German bombing of London. Many people thought the clustering of where the bombs hit indicated that Germans were able to target specific areas in London, but a study by R. D. Clarke demonstrated that the clustering of bomb hits did not differ from what one would expect if they landed randomly. 


In other words, many people saw cause and effect where there was none. But this was to be expected. It's in our nature to do so.

No comments:

Post a Comment