Image credit: Correlation, Causation, and Their Impact on AB Testing
Correlation is never evidence for causation – but, unfortunately, many scientific articles imply that it is. While paying lip service to the correlation-causation mantra, some (possibly many) authors end up arguing that their data is evidence for an effect based solely on the correlations they observe. This is one of the reasons for the replication crisis in science where contradictory results are being reported. Results which cannot be replicated by other workers (see I don’t “believe” in science – and neither should you).
Career prospects, institutional pressure and the need for public recognition will encourage scientists to publish poor quality work that they then use to claim that have found an effect. The problem is that the public, the news media and even many scientists simply do not properly scrutinise the published papers. In most cases they don’t have the specific skills required for this.
There is nothing wrong with doing statistical analyses and producing correlations. However such correlations should be used to suggest future more meaningful and better-designed research like randomised controlled trials (see Smith & Ebrahim 2002 – Data dredging, bias, or confounding. They can all get you into the BMJ and the Friday papers. ). They should never be used as “proof” for an effect, let alone argue that the correlation is evidence to support regulations and advise policymakers.
Hunting for correlations
However, researchers will continue to publish correlations and make great claims for them because they face powerful incentives to promote even unreliable research results. Scientific culture and institutional pressures provide expectations demanding academic researchers produce publishable results. This pressure is so great they will often clutch at straws to produce correlations even when the initial statistical analyst produces none. They will end up “torturing the data.”
These days epidemiological researchers use large databases and powerful statistical software in their search for correlations. Unfortunately, this leads to data mining which, by suitable selection of variables, makes the discovery of statistically significant correlations easy. The data mining approach also means that the often cite p-values are meaningless. P-values measure the probability the relationship occurs by chance and often cited as evidence of the “robustness” of the correlations. But probability is so much greater when researchers resort to checking a range of variables and that isn’t reflected properly in the p-values.
Where data mining occurs, even to a limited extent, researchers are simply attempting to make a purse out of sow’s ear when they support their correlations merely by citing a p-value < 0.05 because these values are meaningless in such cases. The fact that so many of these authors often ignore more meaningful results from their statistical analyses (like R-squared values which indicate the extent that the correlation “explain” the variation in their data) underlines their deceptive approach.
Poor statistical relationships
Consider these correlations below – two data sets are taken from a published paper – the other four use random data provided by Jim Jones in his book Regression Analysis: An Intuitive Guide.
You can probably guess which correlations were from real data (J and M) because there are so many more data points All of these have correlations low p values – but of course, those selected from random data sets resulted from data mining and the p-values are therefore meaningless because they are just a few of the many checked. Remember, a p-value < 0.05 means that the probability of a chance effect is one in twenty and more than twenty variable pairs were checked in this random dataset.
The other two correlations are taken from Bashash et al (2017). They do not give details of how many other variables were checked in the dataset used but it is inevitable that some degree of data mining occurred. So, again, the low p-values are probably meaningless.
J provides the correlation of General Cognitive Index (GCI) scores in children at age 4 years with maternal prenatal urinary fluoride and M provides the correlation of children’s IQ at age 6–12 y with maternal prenatal urinary fluoride. The paper has been heavily promoted by anti-fluoride scientists and activists. None of the promoters have made a critical, objective, analysis of the correlations reported. Paul Connett, director of the Fluoride Action Network, was merely supporting his anti-fluoride activist bias when he uncritically described the correlations as “robust.” They just aren’t.
There is a very high degree of scattering in both these correlations, and the R-squared values indicate they cannot explain any more than about 3 or 4% of the variance in the data. Hardly something to hang one’s hat on, or to be used to argue that policymakers should introduce new regulations controlling community water fluoridation or ban it altogether.
In an effort to make their correlations look better these authors imposed confidence intervals on the graphs (see below). This Xkcd cartoon on curve fitting gives a cynical take on that. The grey areas in the graphs may impress some people but it does not hide the wide scatter of the data points. The confidence intervals refer to estimates of the regression coefficient but when it comes to using the correlations to predict likely effects one must use the prediction intervals which are very large (see Paul Connett’s misrepresentation of maternal F exposure study debunked). In fact, the estimated slopes in these graphs are meaningless when it comes to predictions.

Correlations reported by Bashash et al (2017). The regressions explain very little of the variance in the data and connect be used to make meaningful predictions.
In critiquing the Bashash et al (2017) paper I must concede that at least they made their data available – the data points in the two figures. While they did not provide full or proper results from their statistical analysis (for example they didn’t cite the R-squared values) the data does at least make it possible for other researchers to check their conclusions.
Unfortunately, many authors simply cite p-values and possible confidence intervals for the estimate of the regression coefficient without providing any data or images. This is frustrating for the intelligent scientific reader attempting to critically evaluate their claims.
Conclusions
We should never forget that correlations, no matter how impressive, do not mean causation. It is very poor science to suggest they do.
Nevertheless, many research resort to correlations they have managed to glean from databases, usually resorting to some extent of data mining, to claim they have found an effect and to get published. The drive to publish means that even very poor correlations get promoted and are used by ideologically or career-minded scientists, and by activists, to attempt to convince policymakers of their cause.
Image credit: Xkcd – Correlation
Remember, correlations are never evidence of causation.