p-hacking

Image credit: Quick Data Lessons: Data Dredging

Oh dear – another scientific paper claiming evidence of toxic effects from fluoridation. But a critical look at the paper shows evidence of p-hacking, data dredging and motivated reasoning to derive their conclusions. And it was published in a journal shown to be friendly to such poor science.

The paper is:

Cunningham, J. E. A., Mccague, H., Malin, A. J., Flora, D., & Till, C. (2021). Fluoride exposure and duration and quality of sleep in a Canadian population-based sample. Environmental Health, 1–10.

Data dredging

This study used data from a Canadian database – the Canadian Health Measures Survey. Databases with large numbers of variables tempt researchers to dredge for data or relationships which confirm their biases. Despite the loss of statistical significance in this approach data dredging or data mining is quite common in epidemiological studies.

Cunningham et al (2021) looked for relationships using two separate measure of fluoride exposure and four different measures of possible sleep disturbance. They found a “statistically significant (p<0.05) relationship between lower sleep duration and water fluoride. But no relationships for higher sleep duration, trouble sleeping or daytime sleepiness with either water fluoride or urinary fluoride. Their results for logical regression analysis are summarised in this figure. (Error bars crossing an Odds Ratio value of 1.0 indicate that the relationship is not statistically significant and p<0.05).

Of the 8 relationships investigated only 1 was statistically significant.

I discussed the problem of p-hacking in Statistical manipulation to get publishable results.

With a large dataset, one can inevitably find relationships that satisfy the p<0.05 criteria – because this p-value value is meaningless when multiple relationships are considered. One can even find such “statistically significant relationships” when random datasets are investigated (see Science is often wrong – be critical, I don’t “believe” in science – and neither should you, The promotion of weak statistical relationships in science and Can we trust science). Once multiple relationships are investigated the chance of finding accidental relationships is much greater than 1 in 20 signified by the p<0.05 value.

So, one of the 8 relationships above satisfied the p<0.05 criteria when considered alone. But as part of multiple investigations, the chance of finding such a relationship by chance is much greater than 1 in 20.

Motivated reasoning

Image credit: Xkcd. 2167: Motivated Reasoning Olympics

This paper smacks of motivated reasoning. The authors obviously have a commitment to the concept that fluoride causes problems with the pineal gland and drag up anything they can find in the literature to support this – without critically assessing the quality of the cited work or even mentioning the fact that the cited studies were made at much higher fluoride concentration on non-human animals. In effect, they are attempting to convert very weak results, obtained by data dredging and p-hacking, to a fact. They are attempting to make a purse out of a sow’s ear.

This research group is not new to this game. I commented on this in my critique of another sleep disorder paper from the group (see Statistical manipulation to get publishable results).

Many of the same researchers are listed as authors on both papers – yet Cummingham et al (2021 ) cite the previous paper as if it was an independent study. They say “As far as we are aware, this is only the second human
study investigating the effects of fluoride exposure on sleep outcomes” which is simply disingenuous considering the involvement of the same researchers in both papers.

Both these papers were also published in the same journal – Environmental Health – a pay-to publish-journal that is known to be friendly to anti-fluoride researchers and uses very sympathetic peer reviewers (see Some fluoride-IQ researchers seem to be taking in each other’s laundry). The Chief editor, Philippe Grandjean, is well known for his opposition to fluoridation. I commented on his refusal to consider a paper of mine that critiqued an anti-fluoride paper published in his journal (see Fluoridation not associated with ADHD – a myth put to rest).

Conclusion

Yet another very weak study, published in an anti-fluoride friendly pay-to-publish journal with poor peer review. Despite the weaknesses due to data dredging, p-hacking and motivated reasoning, anti-fluoride activists will cite the single “statistically significant” result as gospel and ignore the 7 relationships that are not significant. As for inadequate consideration of confounders or other risk-modifying factors, this study ignores completely the fact that city size and geographic factors have a strong effect on both sleep patterns and water fluoride concentrations (see Perrott 2018). Such inadequate consideration of confounders is another common problem in epidemiological studies.

Oh, well, we are not a rational species. More a rationalising one. And in such areas motivated rationalisation and confirmation bias is rife.

Similar studies

I love data. It’s amazing the sort of “discoveries” I can make given a data set and computer statistical package. It’s just so easy to search for relationships and test their statistical significance. Maybe relationships which we feel are justified by our experience – or even new ones we hadn’t thought of previously.

It’s a lot of fun. Here’s a tool readers can use to explore a data set involving information on US Political leadership and the US economy –Hack Your Way To Scientific Glory (The image above shows the tool but it’s only an image. WordPress won’t allow me to embed the site but you can access it by clicking on the image).

Try searching for relationships between political leadership and the economy. If you can find a relationship with a p-value < 0.05 you might feel the urge to publish your findings. After all, p-values < 0.05 seem to be the gold standard for scientific journal these days.

Statistical manipulation a big problem in published science

Problem is, by playing with this data you could producing statistically significant relationships that “show” both Republicans and Democrats hurt the economy, or that both are good for the economy. It’s simply a matter of choosing the appropriate factors to define political leadership and appropriate factors to measure the economic situation.

The process is called p-hacking or data dredging. Time spent playing with this tool should convince you that it is easy to confirm one’s own political biases about political leadership and political parties using statistical techniques. It should also convince you this is very bad science. But, unfortunately, it happens. Even respectable journals will publish papers reporting relationships obtained by p-hacking, provided a p-value of less than 0.05 can be shown.

The article Science Isn’t Broken: It’s just a hell of a lot harder than we give it credit for includes the p-hacking tool and discusses how widespread the problem is in the published scientific literature. It also describes the concern that statisticians and scientists have about this sort of publication.

The author, says:

“The variables in the data sets you used to test your hypothesis had 1,800 possible combinations. Of these, 1,078 yielded a publishable p-value, but that doesn’t mean they showed that which party was in office had a strong effect on the economy. Most of them didn’t.

The p-value reveals almost nothing about the strength of the evidence, yet a p-value of 0.05 has become the ticket to get into many journals. “The dominant method used [to evaluate evidence] is the p-value,” said Michael Evans, a statistician at the University of Toronto, “and the p-value is well known not to work very well.”

Statistical manipulation and p-hacking in fluoride studies

In my articles on the way scientific papers relating to fluoridation are misrepresented, I have often referred to the misleading use of p-values to argue that a study is very strong or a relationship important. Paul Connett, head of the Fluoridation Network (FAN), often uses that argument. (see for example Connett fiddles the data on fluoride, Connett misrepresents the fluoride and IQ data yet again, and Anti-fluoridation campaigners often use statistical significance to confirm bias).

But I have noticed p-hacking and data dredging are real problems with some of the more recent studies of fluoride and IQ. Partly because these papers are being published by some reputable journals. Also because some reviewers and scientific readers seem completely unaware of the problem and therefore are uncritically taking some of the claimed findings at face value.

I have gone through some recent papers on this issue and pulled out the factors used to represent child cognitive abilities and to represent F exposure or intake. These are listed below for 7 papers and a thesis.

Study	Cognitive factor	F exposure
Malin & Till (2015)	ADHD prevalence in US states	Fluoridation extent in US states
Thomas (2014)	WAS) Bayley Infant Scales of Development-II (BSID-II) MDI	MUF Blood plasma F Concurrent child urinary F
Bashesh et al., (2017)	CGI FSIQ VIQ	MUF_CrConcurrent child urinary F_SG
Bashesh et al., (2018)	ADHD CRS scores 3CPT scores	MUF_Cr
Thomas et al., (2018)	MDI	MUF_Cr
Green et al., (2019)	FSIQ* boys PIQ* boys VIQ ns Sex	MUF_CrSex Fluoridation Estimated F intake by mother
Riddell et al., (2019)	SDQ hyperactive/inattentive score ADHD – parent-reported or questionnaire	MUF_SG Water F Age
Till et al., (2020)	FSIQ PIQ VIQ	MUF Water F Fluoridation
Santa-Marina et al (2019)	perceptual-manipulative scale verbal function, perceptive-manipulative general cognitive	MUF

Footnotes (see papers for full information):
MUF – Prenatal maternal urinary F
MUF_Cr – Prenatal maternal urinary F adjusted using creatinine concentration
MUF_SG – Prenatal maternal urinary F adjusted using specific gravity
Concurrent child urinary F_SG – child urinary F at the time of IQ assessment adjusted using specific gravity
CGI – general cognitive index
FSIQ – Full-Scale IQ
PIQ – Performance IQ
VIQ – Verbal IQ
MDI – Mental development index
WASI – Wechsler Abbreviate Scale of Intelligence

As you can see, just like the political leadership/economy example illustrated in the p-hacking tool above there is a range of both cognitive measurements and fluoride expose factors which can be cherry-picked to produce the “right” answer (or confirm one’s bias). Most of these studies can also select from up to three cohorts. So it’s not surprising that relationships can be found to support the argument that fluoride has a negative effect on child cognitive abilities. But we can also find statistically significant relationships to support the argument that fluoride has a positive effect on cognitive abilities. Or, alternatively, that fluoride has no effect at all on cognitive abilities.

Another warning sign is that the relationships that are cited (and which have p-values < 0.05) are all extremely weak and explain only a few per cent of the variance in the data. While the complete statistical analyses are not given in most of the papers (another big problem in published research) the figures show a very high scatter in the data and the quoted confidence intervals confirm this.

Even where p < 0.05 the data can be extremely scattered and the relationship so weak as to be meaningless. Figure 1 from Till et al., (2020)

Yet another warning sign is that when relationships are reported they are only true for different cognitive factors or different fluoride exposure factors. And again, they may only be true for one sex or for a limited age group.

Conclusion

Geoff Cumming wrote in A Primer on p Hacking that:

“Statisticians have a saying: if you torture the data enough, they will confess.”

We should always remember this when reading papers which rely on low p-values to support a relationship. I think this is a big problem in a lot of published science but it is certainly a problem with the fluoride-IQ research currently being published.

The real take-home message from this particular research is that all the reported relationships are extremely weak, the data has been “tortured,” and it is easy to select parameters to produce a relationship with a p-value < 0.05 to confirm a bias.

In fact, the results from these studies are contradictory, confusing and extremely weak. They may be useful to political activists who have biases to confirm or ideological agendas to promote. but they are not sufficient to influence public health policy.

Similar articles

	David Fierstien on Why has journalism, and Tucker…
	Ken on Why has journalism, and Tucker…
	David Fierstien on Why has journalism, and Tucker…
	David Fierstien on Why has journalism, and Tucker…
	Ken on Why has journalism, and Tucker…
	David Fierstien on Why has journalism, and Tucker…
	Ken on Why has journalism, and Tucker…
	Ken on Why has journalism, and Tucker…
	Ken on Why has journalism, and Tucker…
	David Fierstien on Why has journalism, and Tucker…
	David Fierstien on Why has journalism, and Tucker…
	David Fierstien on Why has journalism, and Tucker…
	Ken on Why has journalism, and Tucker…
	Ken on Why has journalism, and Tucker…
	David Fierstien on Why has journalism, and Tucker…

Tag Archives: p-hacking

Data dredging, p-hacking and motivated discussion in anti-fluoride paper

Data dredging

p-hacking

Motivated reasoning

Conclusion

Statistical manipulation to get publishable results

Statistical manipulation a big problem in published science

Statistical manipulation and p-hacking in fluoride studies

Conclusion

Facebook Page

Recent Comments

Recent Posts

Top Posts

All posts

Categories

NZ Science Blogs

Articles and comments search

Archives

Add to Favourites

Twitter comments