Fooled by correlation

One of the most important lessons from statistics is correlation does not imply causation. This tells us that a relationship between two items does not necessarily mean that one causes the other to occur.

A classic example for this is ice cream sales and shark attacks. In the US these two things are strongly correlated. The amount of ice cream sales closely follows the amount of shark attacks. It is obvious that ice cream does not cause shark attacks or vice versa. The answer lies in the fact that both increase when the weather gets warmer.

Not many people are likely to be fooled that the link between ice cream sales and shark attacks is causal, but it does not stop them from regularly mistaking correlation for causation in other contexts.

In his new book, How to Stay Smart in a Smart World, the German psychologist Gerd Gigerenzer explains that this likely to increasingly be an issue with big data analytics. Algorithms are trained to look for correlations when mining big data to try and unearth connections and insights. But this can lead us down the wrong path.

Gigerenzer has some great examples of this in his book. He finds a good correlation between the number of people who drowned by falling into a pool and number of films Nicolas Cage appeared in of 0.67 (where 0 means no correlation and 1 is perfect correlation).

There is near perfect correlation of 0.99 between the divorce rate in Maine and the per capita consumption of margarine in the US. So couples in the US should watch out for their margarine consumption!

There is a serious point here though that we need to be careful with claiming causation in economic data. Correlations between two datasets do not necessarily mean anything. The increase in the number and availability of datasets increase the likelihood of these type of errors. Additional statistical work, narratives and common sense are vital to make sense of data and produce robust economic analysis.

Alex O’Byrne

Image: pxhere

Share this post