Studies based simply on statistically significant relationships found by mining data from large databases are a big problem in the scientific literature. Problematic because data mining, or worse data dredging, easily produces relationships that are statistically significant but meaningless. And problematic because authors wishing to confirm their biases and promote their hypotheses conveniently forget the warning that correlation is not evidence for causation and go on to promote their relationships as proof of effects. Often they seem to be successful in convincing regulators and policymakers that their serious relationships should result in regulations. Then there are the activists who don’t need convincing but will promote willingly and tiresomely these studies if they confirm their agendas.
Even random data can provide statistically significant relationships
The graphs below show the fallacy of relying only on statistically significant relationships as proof of an effect. The show linear regression result for a number of data sets. One data set is taken from a published paper – the rest use random data provided by Jim Jones in his book Regression Analysis: An Intuitive Guide.
All these regressions look “respectable.” They have low p values (less than the conventional 0.05 limit) and the R-squared values indicated they “explain” a large fraction of the data – up to 49%. But the regressions are completely meaningless for at least 7 of the 8 data sets because the data were randomly generated and have no relevance to real physical measurements.
This should be a warning that correlations reported in scientific papers may be quite meaningless.
Can you guess which of the graphs is based on real data? It is actually the graph E – published by members of a North American group currently publishing data which they claim shows community water fluoridation reduces child IQ. This was from one of their first papers where they claimed childhood ADHD was linked to fluoridation (see Malin, A. J., & Till, C. 2015. Exposure to fluoridated water and attention deficit hyperactivity disorder prevalence among children and adolescents in the United States: an ecological association).
The group used this paper to obtain funding for subsequent research. They obviously promoted this paper as showing real effects – and so have the anti-fluoride activists around the world, including the Fluoride Action Network (FAN) and its director Paul Connett.
But the claims made for this paper, and its promotion, are scientifically flawed:
- Correlation does not mean causation. Such relationships in larger datasets often occur by chance – hell they even occur with random data as the figure above shows.
- Yes, the authors argue there is a biologically plausible mechanism to “explain” their association. But that is merely cherry-picking to confirm a bias and there are other biologically plausible mechanisms they did not consider which would say there should not be an effect. The unfortunate problem with these sorts of arguments is that they are used to justify their findings as “proof” of an effect. To violate the warning that correlation is not causation.
- There is the problem of correcting for cofounders or other risk-modifying factors. While acknowledging the need for future studies considering other confounders, the authors considered their choice of socio-economic factors was sufficient and their peer reviewers limited their suggestion of other confounders to lead. However, when geographic factors were included in a later analysis of the data the reported relationship disappeared.
Confounders often not properly considered
Smith & Ebrahim (2002) discuss this problem an article – Data dredging, bias, or confounding. They can all get you into the BMJ and the Friday papers. The title itself indicates how the poor use of statistics and unwarranted promotion of statical analyses can be used to advance scientific careers and promote bad science in the public media.
These authors say:
“it is seldom recognised how poorly the standard statistical techniques “control” for confounding, given the limited range of confounders measured in many studies and the inevitable substantial degree of measurement error in assessing the potential confounders.”
This could be a problem even for studies where a range of confounders are included in the analyses. But Malin & Till (2015) considered the barest minimum of confounders and didn’t include ones which would be considered important to ADHD prevalence. In particular, they ignored geographic factors and these were shown to be important in another study using the same dataset. Huber et al (2015) reported a statistically significant relationship of ADHD prevalence with elevation. These relationships are shown in this figure
Of course, this is merely another statistically significant relationship – not proof of a real effect and no more justified than the one reported by Malin and Till (2015). But it does show an important confounder that Malin & Till should have included in their statistical analysis.
I did my own statistical analysis using the data set of Malin & Till (2015) and Huber et al (2015) and showed (Perrott 2018) that inclusion of geographic factors showed there was no statistically significant relationship of ADHD prevalence with fluoridation as suggest by Malin & Till (2015). Their study was flawed and it should never have been used to justify funding for future research on the effect of fluoridation. Nor should it have been used by activists promoting an anti-fluoridation agenda.
But, then again, derivation of a statistically significant relationship by Malin & Till (2o15) did get them published in the journal Environmental Health which, incidentally, has sympathetic reviewers (see Some fluoride-IQ researchers seem to be taking in each other’s laundry) and an anti-fluoridation Chief Editor – Phillipe Grandjean (see Special pleading by Philippe Grandjean on fluoride). It also enabled the promotion of their research via institutional press releases, newspaper article and the continual activity of anti-fluoridation activists. Perhaps some would argue this was a good career move!
OK, the faults of the Malin & Till (2015) study have been revealed – even though Perrott (2018) is studiously ignored by the anti-fluoride North American group which has continued to publish similar statistically significant relationships of measures of fluoride uptake and measures of ADH or IQ.
But there are many published papers – peer-reviewed papers – which suffer from the same faults and get similar levels of promotion. They are rarely subject to proper post-publication peer-review or scientific critique. But their authors get career advancement and scientific recognition out of their publication. And the relationships are promoted as evidence for real effects in the public media.
No wonder members of the public are so often confused by the contradictory reporting, the health scares of the week, they are exposed to.
No wonder many people feel they can’t trust science.