Tag Archives: statistics

Can we trust science?

Image credit: Museum collections as research data

Studies based simply on statistically significant relationships found by mining data from large databases are a big problem in the scientific literature. Problematic because data mining, or worse data dredging, easily produces relationships that are statistically significant but meaningless. And problematic because authors wishing to confirm their biases and promote their hypotheses conveniently forget the warning that correlation is not evidence for causation and go on to promote their relationships as proof of effects. Often they seem to be successful in convincing regulators and policymakers that their serious relationships should result in regulations. Then there are the activists who don’t need convincing but will promote willingly and tiresomely these studies if they confirm their agendas.

Even random data can provide statistically significant relationships

The graphs below show the fallacy of relying only on statistically significant relationships as proof of an effect. The show linear regression result for a number of data sets. One data set is taken from a published paper – the rest use random data provided by Jim Jones in his book Regression Analysis: An Intuitive Guide.

All these regressions look “respectable.” They have low p values (less than the conventional 0.05 limit) and the R-squared values indicated they “explain” a large fraction of the data – up to 49%. But the regressions are completely meaningless for at least 7 of the 8 data sets because the data were randomly generated and have no relevance to real physical measurements.

This should be a warning that correlations reported in scientific papers may be quite meaningless.

Can you guess which of the graphs is based on real data? It is actually the graph E – published by members of a North American group currently publishing data which they claim shows community water fluoridation reduces child IQ. This was from one of their first papers where they claimed childhood ADHD was linked to fluoridation (see Malin, A. J., & Till, C. 2015. Exposure to fluoridated water and attention deficit hyperactivity disorder prevalence among children and adolescents in the United States: an ecological association).

The group used this paper to obtain funding for subsequent research. They obviously promoted this paper as showing real effects – and so have the anti-fluoride activists around the world, including the Fluoride Action Network (FAN) and its director Paul Connett.

But the claims made for this paper, and its promotion, are scientifically flawed:

  1. Correlation does not mean causation. Such relationships in larger datasets often occur by chance – hell they even occur with random data as the figure above shows.
  2. Yes, the authors argue there is a biologically plausible mechanism to “explain” their association. But that is merely cherry-picking to confirm a bias and there are other biologically plausible mechanisms they did not consider which would say there should not be an effect. The unfortunate problem with these sorts of arguments is that they are used to justify their findings as “proof” of an effect. To violate the warning that correlation is not causation.
  3. There is the problem of correcting for cofounders or other risk-modifying factors. While acknowledging the need for future studies considering other confounders, the authors considered their choice of socio-economic factors was sufficient and their peer reviewers limited their suggestion of other confounders to lead. However, when geographic factors were included in a later analysis of the data the reported relationship disappeared. 

Confounders often not properly considered

Smith & Ebrahim (2002) discuss this problem an article  – Data dredging, bias, or confounding. They can all get you into the BMJ and the Friday papers. The title itself indicates how the poor use of statistics and unwarranted promotion of statical analyses can be used to advance scientific careers and promote bad science in the public media.

These authors say:

“it is seldom recognised how poorly the standard statistical techniques “control” for confounding, given the limited range of confounders measured in many studies and the inevitable substantial degree of measurement error in assessing the potential confounders.”

This could be a problem even for studies where a range of confounders are included in the analyses. But Malin & Till (2015) considered the barest minimum of confounders and didn’t include ones which would be considered important to ADHD prevalence. In particular, they ignored geographic factors and these were shown to be important in another study using the same dataset. Huber et al (2015) reported a statistically significant relationship of ADHD prevalence with elevation. These relationships are shown in this figure

Of course, this is merely another statistically significant relationship – not proof of a real effect and no more justified than the one reported by Malin and Till (2015). But it does show an important confounder that Malin & Till should have included in their statistical analysis.

I did my own statistical analysis using the data set of Malin & Till (2015) and Huber et al (2015) and showed (Perrott 2018) that inclusion of geographic factors showed there was no statistically significant relationship of ADHD prevalence with fluoridation as suggest by Malin & Till (2015). Their study was flawed and it should never have been used to justify funding for future research on the effect of fluoridation. Nor should it have been used by activists promoting an anti-fluoridation agenda.

But, then again, derivation of a statistically significant relationship by Malin & Till (2o15) did get them published in the journal Environmental Health which, incidentally, has sympathetic reviewers (see Some fluoride-IQ researchers seem to be taking in each other’s laundry) and an anti-fluoridation Chief Editor – Phillipe Grandjean (see Special pleading by Philippe Grandjean on fluoride). It also enabled the promotion of their research via institutional press releases, newspaper article and the continual activity of anti-fluoridation activists. Perhaps some would argue this was a good career move!


OK, the faults of the Malin & Till (2015) study have been revealed – even though Perrott (2018) is studiously ignored by the anti-fluoride North American group which has continued to publish similar statistically significant relationships of measures of fluoride uptake and measures of ADH or IQ.

But there are many published papers – peer-reviewed papers – which suffer from the same faults and get similar levels of promotion. They are rarely subject to proper post-publication peer-review or scientific critique. But their authors get career advancement and scientific recognition out of their publication. And the relationships are promoted as evidence for real effects in the public media.

No wonder members of the public are so often confused by the contradictory reporting, the health scares of the week, they are exposed to.

No wonder many people feel they can’t trust science.

Similar articles

Statistical manipulation to get publishable results

I love data. It’s amazing the sort of “discoveries” I can make given a data set and computer statistical package. It’s just so easy to search for relationships and test their statistical significance. Maybe relationships which we feel are justified by our experience – or even new ones we hadn’t thought of previously.

It’s a lot of fun. Here’s a tool readers can use to explore a data set involving information on US Political leadership and the US economy –Hack Your Way To Scientific Glory (The image above shows the tool but it’s only an image. WordPress won’t allow me to embed the site but you can access it by clicking on the image).

Try searching for relationships between political leadership and the economy. If you can find a relationship with a p-value < 0.05 you might feel the urge to publish your findings. After all, p-values < 0.05 seem to be the gold standard for scientific journal these days.

Statistical manipulation a big problem in published science

Problem is, by playing with this data you could producing statistically significant relationships that “show” both Republicans and Democrats hurt the economy, or that both are good for the economy. It’s simply a matter of choosing the appropriate factors to define political leadership and appropriate factors to measure the economic situation.

The process is called p-hacking or data dredging. Time spent playing with this tool should convince you that it is easy to confirm one’s own political biases about political leadership and political parties using statistical techniques. It should also convince you this is very bad science. But, unfortunately, it happens. Even respectable journals will publish papers reporting relationships obtained by p-hacking, provided a p-value of less than 0.05 can be shown.

The article Science Isn’t Broken: It’s just a hell of a lot harder than we give it credit for includes the p-hacking tool and discusses how widespread the problem is in the published scientific literature. It also describes the concern that statisticians and scientists have about this sort of publication.

The author, says:

“The variables in the data sets you used to test your hypothesis had 1,800 possible combinations. Of these, 1,078 yielded a publishable p-value, but that doesn’t mean they showed that which party was in office had a strong effect on the economy. Most of them didn’t.

The p-value reveals almost nothing about the strength of the evidence, yet a p-value of 0.05 has become the ticket to get into many journals. “The dominant method used [to evaluate evidence] is the p-value,” said Michael Evans, a statistician at the University of Toronto, “and the p-value is well known not to work very well.”

Statistical manipulation and p-hacking in fluoride studies

In my articles on the way scientific papers relating to fluoridation are misrepresented, I have often referred to the misleading use of p-values to argue that a study is very strong or a relationship important. Paul Connett, head of the Fluoridation Network (FAN), often uses that argument. (see for example  Connett fiddles the data on fluoride, Connett misrepresents the fluoride and IQ data yet again, and Anti-fluoridation campaigners often use statistical significance to confirm bias).

But I have noticed p-hacking and data dredging are real problems with some of the more recent studies of fluoride and IQ. Partly because these papers are being published by some reputable journals. Also because some reviewers and scientific readers seem completely unaware of the problem and therefore are uncritically taking some of the claimed findings at face value.

I have gone through some recent papers on this issue and pulled out the factors used to represent child cognitive abilities and to represent F exposure or intake. These are listed below for 7 papers and a thesis.

Study Cognitive factor F exposure
Malin & Till (2015) ADHD prevalence in US states Fluoridation extent in US states
Thomas (2014) WAS)
Bayley Infant Scales of Development-II (BSID-II)
Blood plasma F
Concurrent child urinary F
Bashesh et al., (2017) CGI
Concurrent child urinary FSG
Bashesh et al., (2018) ADHD
CRS scores
3CPT scores
Thomas et al., (2018) MDI MUFCr
Green et al., (2019) FSIQ* boys
PIQ* boys
VIQ ns
Estimated F intake by mother
Riddell et al., (2019) SDQ hyperactive/inattentive score
ADHD – parent-reported or questionnaire
Water F
Till et al., (2020) FSIQ
Water F
Santa-Marina et al (2019) perceptual-manipulative scale
verbal function,
general cognitive

Footnotes (see papers for full information):
MUF – Prenatal maternal urinary F
MUFCr – Prenatal maternal urinary F adjusted using creatinine concentration
MUFSG – Prenatal maternal urinary F adjusted using specific gravity
Concurrent child urinary FSG – child urinary F at the time of IQ assessment adjusted using specific gravity
CGI – general cognitive index
FSIQ – Full-Scale IQ
PIQ – Performance IQ
VIQ – Verbal IQ
MDI – Mental development index
WASI – Wechsler Abbreviate Scale of Intelligence

As you can see, just like the political leadership/economy example illustrated in the p-hacking tool above there is a range of both cognitive measurements and fluoride expose factors which can be cherry-picked to produce the “right” answer (or confirm one’s bias). Most of these studies can also select from up to three cohorts. So it’s not surprising that relationships can be found to support the argument that fluoride has a negative effect on child cognitive abilities. But we can also find statistically significant relationships to support the argument that fluoride has a positive effect on cognitive abilities. Or, alternatively, that fluoride has no effect at all on cognitive abilities.

Another warning sign is that the relationships that are cited (and which have p-values < 0.05) are all extremely weak and explain only a few per cent of the variance in the data. While the complete statistical analyses are not given in most of the papers (another big problem in published research) the figures show a very high scatter in the data and the quoted confidence intervals confirm this.

Even where p < 0.05 the data can be extremely scattered and the relationship so weak as to be meaningless. Figure 1 from Till et al., (2020)

Yet another warning sign is that when relationships are reported they are only true for different cognitive factors or different fluoride exposure factors. And again, they may only be true for one sex or for a limited age group.


Geoff Cumming wrote in A Primer on p Hacking that:

“Statisticians have a saying: if you torture the data enough, they will confess.”

We should always remember this when reading papers which rely on low p-values to support a relationship. I think this is a big problem in a lot of published science but it is certainly a problem with the fluoride-IQ research currently being published.

The real take-home message from this particular research is that all the reported relationships are extremely weak, the data has been “tortured,” and it is easy to select parameters to produce a relationship with a p-value < 0.05 to confirm a bias.

In fact, the results from these studies are contradictory, confusing and extremely weak. They may be useful to political activists who have biases to confirm or ideological agendas to promote. but they are not sufficient to influence public health policy.

Similar articles


ADHD and fluoride – wishful thinking supported by statistical manipulation?

Finding reality needs more than wishful thinking. The problem is that statistical arguments often provide a jargon to confirm biases. Image credit: Accurate Thinking Versus Wishful Thinking in Gambling

I worry at the way some scientists use statistics to confirm their biases – often by retrieving marginal relationships from data that do not appear to provide evidence for their claims. This seems to be happening with the recent publication of a study reporting on maternal urinary fluoride-child IQ relationships in Canada (see If at first you don’t succeed . . . statistical manipulation might help).

Now we have a new paper from this group of researchers that seems to be repeating the pattern – this time with fluoride- attention deficit hyperactivity disorder (ADHD) relationships. The paper is:

Riddell, J. K., Malin, A., Flora, D., McCague, H., & Till, C. (2019). Association of water fluoride and urinary fluoride concentrations with Attention Deficit Hyperactivity Disorder in Canadian Youth. Submitted to Environment International, 133(May), 105190.

At first sight, the data does not seem promising for the fluoride-ADHD story. Compare the values of some of the factors they considered for Canadian youth which have been diagnosed with ADHD with values for youth not diagnosed with ADHD (From Table 2 in the paper).

It seems that being a male and exposure to smoking in the home are two factors predisposing youth to ADHD (already known)  but the fluoride in tap water and fluoride intake (indicated by urinary F) have no effect. Although the data suggest that residence in sites where F is added to tap water may reduce the chances of ADHD diagnosis.

But the authors actually conclude that fluoride does increase the chance of an ADHD diagnosis. So it seems, once again, statistics appear to have been used in an attempt to incriminate fluoride – to make a silk purse out of a pig’s ear.

In effect, the paper is reporting three separate studies:

  • They looked for a relationship of ADHD diagnosis with urinary fluoride;
  • They checked if there was a difference in ADHD prevalence for youths living in fluoridated or unfluoridated areas, and
  • They looked for a relationship of ADHD diagnosis with F in tap water.

No relationship of ADHD with urinary fluoride

SDQ hyperactive/inattentive subscale scores were obtained using a Strengths and Difficulties Questionnaire. Information about ADHD diagnosis and SDQ ratings were provided by parents of children aged 6 – 11 years and from a questionnaire completed by youth aged 12 – 17 years.

The paper reports:

“UFSG [urinary fluoride] did not significantly predict an ADHD diagnosis (adjusted Odds Ratio [aOR]=0.96; 95% CI: 0.63, 1.46, p=.84) adjusting for covariates.”


“UFSG did not significantly predict SDQ hyperactive/inattentive subscale scores
(B=0.31, 95% CI=−0.04, 0.66, p=.08).

So no luck there (for the authors who appear to be wishing to confirm a bias). The tone of the discussion indicates the authors were disappointed  as they considered urinary fluoride has “has the advantage of examining all sources of fluoride exposure, not just from drinking water.” However, they did discuss some of the disadvantages of the spot samples for urinary fluoride they used:

“. . . urinary fluoride levels in spot samples are more likely to fluctuate due to the rapid elimination kinetics of fluoride. Additionally, urinary fluoride values may capture acute exposures due to behaviours that were not controlled in this study, such as professionally applied varnish, consumption of beverages with high fluoride content (e.g., tea), or swallowing toothpaste prior to urine sampling. Finally, the association between urinary fluoride and attention-related outcomes could be obscured due to reduced fluoride excretion (i.e., increased fluoride absorption) during a high growth spurt stage.”

We should note the WHO recommends against using urinary F as an indicator of F intake for individuals, and certainly against using spot samples (see Anti-fluoridation campaigner, Stan Litras, misrepresents WHO). They recommend 24-hour collections (see the WHO document Basic Methods for Assessment of Renal Fluoride Excretion in Community Prevention Programmes for Oral Health”). I really cannot understand why these researchers chose spot sampling over 24-hour sample collection – although this would have not overcome the problem that urinary F is not a good indicator of fluoride intake at the individual level.

While it is refreshing to see the disadvantages of spot samples for urinary fluoride discussed, this probably would not have happened if they had managed to find a relationship. Neither Green et al., (2109) or Bashash et al., (2017) considered these problems – but then they managed to find relationships (although very weak ones) for spot samples.

Relationship of ADHD diagnoses with fluoridation

While this paper reports a significant (p<0.05) relationship of ADHD diagnosis and SDQ ratings with community water fluoridation (CWF) this really only applies to older youth (14 years). The relationship is not significant for younger youth (9 years).

However, the relationship is rather tenuous –  this effect of age for ADHD diagnosis was seen only for “cycle 3” date (collected from 2012 to 2013) and was not seen for “cycle 2” data (collected from 2009 to 2011). The confidence intervals for Odd Ratios are also quite large – indicated the high variance in the data.

I think their conclusion of an effect due to fluoride and their lack of consideration of the poor quality of their relationships and alternative explanations for their results smacks a bit of straw clutching. The authors appear too eager to speculate on possible mechanisms involving fluoride rather than properly evaluating the quality of the relationships they found.

Relationship of ADHD diagnoses with tap water F

The paper reports a statistically significant (p<0.05) relationship of ADHD diagnosis with tap water fluoride. While the reported Odds Ratio appears very large (“a 1 mg/L increase in tap water fluoride was associated with a 6.1 times higher odds of ADHD diagnosis”) the 95% confidence interval is very large (1.60 to 22.8) indicating a huge scatter in the data. Unfortunately, the authors did not provide any more information from their statistical analysis to clarify the strength of the relationship.

Again, there was a significant relationship of SDQ score with tap water fluoride concentration but in this case, it was only significant for older youth and the CI was also relatively large.

So again the relationships with tap water F are tenuous – influenced by age and with large confidence intervals indicating a wide scatter in the data.

Problems with the paper’s discussion

Of course, correlation by no means implies causation. But there is always the problem of confirmation bias and special pleading where a low p-value in a regression analysis gets construed as evidence for the preferred outcome.

There are problems with relying only on p-values – which is why I have referred to confidence intervals and would prefer to actually see the actual data and full reports of the statistical analyses. The confidence interval values indicate that the data is highly scattered and the reported models from the regression analyses in this paper probably explain very little of the data. In such cases, there is a temptation to dig deeper and search for significant relationships by separating the data by sex or age but the resulting significant relationships may be meaningless.

And the “Elephant in the Room” – the relationships themselves say nothing about the reliability of the favored model. Nothing at all. A truly objective researcher would recognize this and avoid the staw clutching and rationalisation of evidence in the paper’s discussion. For example, the author’s considered another Canadian study which did not find any relationship of ADHD to fluoride in drinking water and argued the difference was solely due to deficiencies in the other study, not theirs.

The authors also seem not to recognise that any relationship they found may have nothing to do with fluoride but could be the result of other related risk-modifying factors they did not include in their statistical analysis. Worse, the argue their results are consistent with those of Malin and Till (2015) without any acknowledgment that that specific study is flawed. Perrott (2108) showed that the relationship reported by Malin & Till (2015) disappeared completely when the altitude was included in the statistical analysis. This is consistent with the study of Huber et al., (2015) which reported a statistically significant relationship of ADHD prevalence with altitude.


I think the Riddell et al., (2109) paper presents problems similar to those seen with a previous paper from this research group – Green et al., (2019). I have discussed some of these problems in previous articles:

Others in the scientific community have also expressed concern about the problems in that paper and a recent in-depth critical evaluation of (see CADTH RAPID RESPONSE REPORT: Community Water Fluoridation Exposure: A Review of Neurological and Cognitive Effects) pointed to multiple “limitations (e.g., non-homogeneous distribution of data, potential errors and biases in the estimation of maternal fluoride exposure and in IQ measurement, uncontrolled potential important confounding factors).” It urged that “the findings of this study should be interpreted carefully.”

More significantly widespread scientific concern about weaknesses in the Green et al., (2019) paper has led  30 scientific and health experts to write to the funding body involved (US National Insitute of Environmental Health Science – NIEHS) outlining their concern and appealing for the data to be made public for independent assessment (see Experts complain to funding body about quality of fluoride-IQ research Download their letter). Last I was aware the authors were refusing to release their data – claiming not to own it!

We could well see similar responses to the Riddell et al., (2109) ADHD paper.

Similar articles



Anti-fluoride group coordinator responds to my article

Image credit: Debate. The science of communication.

My recent article Paul Connett’s misrepresentation of maternal F exposure study debunked got some online feedback and criticism from anti-fluoride activists. Mary Byrne, National coordinator Fluoride Free New Zealand, wrote a response and requested it is published on SciBlogs “in the interests of putting the record straight and providing balance.”

I welcome her response and have posted it here. Hopefully, this will satisfy her right of reply and help to develop some respectful, good faith, scientific exchange on the issue.

I will respond to Mary’s article within a few days.

Perrott wrong. New US Government study does find large, statistically significant, lowering of IQ in children prenatally exposed to fluoride

By Mary Byrne, National coordinator Fluoride Free New Zealand.

While the New Zealand Ministry of Health remains silent on a landmark, multi-million-dollar, US Government funded study (Bashash et al), and the Minister of Health continues to claim safety based on out-dated advice, fluoride promoter Ken Perrott has sought to discredit the study via his blog posts and tweets.

Perrott claims that the results were not statistically significant but his analysis is incorrect.

The conclusion by the authors of this study, which was published in the top environmental health journal, Environmental Health Perspectives, was:

In this study, higher prenatal fluoride exposure, in the general range of exposures reported for other general population samples of pregnant women and nonpregnant adults, was associated with lower scores on tests of cognitive function in the offspring at age 4 and 6–12 y.”

Perrott states the study has “a high degree of uncertainty”. But this contrasts with the

statistical analysis and conclusion of the team of distinguished neurotoxicity researchers from Harvard, the University of Toronto, Michigan and McGill. These researchers have written over 50 papers on similar studies of other environmental toxics like lead and mercury.

RESULTS: In multivariate models we found that an increase in maternal urine fluoride of 0.5 mg/L (approximately the IQR) predicted 3.15 (95% CI: −5.42, −0.87) and 2.50 (95% CI −4.12, −0.59) lower offspring GCI and IQ scores, respectively.

The 95% CI is the 95% Confidence Interval which is a way of judging how likely the results of the study sample reflect the true value for the population. In this study, the 95% CIs show the results are highly statistically significant. They give a p-value of 0.01 which means if the study were repeated 100 times with different samples of women only once could such a large effect be due to chance.

Perrott comes to his wrong conclusion because he has confused Confidence Intervals with Prediction Intervals and improperly used Prediction Intervals to judge the confidence in the results. A Prediction Interval is used to judge the confidence one has in predicting an effect on a single person, while a Confidence Interval is the proper measure to judge an effect on a population. In epidemiological studies, it is the average effect on the population that is of interest, not how accurately you can predict what will happen to a single person.

Despite the authors controlling for numerous confounders, Perrott claimed they did not do a very good job and had inadequately investigated gestational age and birth weight.

Once again Perrott makes a fundamental mistake when he says that the “gestational period < 39 weeks or > 39 weeks was inadequate” and “The cutoff point for birth weight (3.5 kg) was also too high.”

Perrott apparently did not understand the Bashash paper and mistook what was reported in Table 2 with how these covariates were actually treated in the regression models. The text of the paper plainly states:

“All models were adjusted for gestational age at birth (in weeks), birthweight (kilograms)”

Thus, each of these two variables were treated as continuous variables, not dichotomized into just two levels. Perrott’s criticism is baseless and reveals his misunderstanding of the Bashash paper.

Perrott states that the results are not relevant to countries with artificial fluoridation because it was done in Mexico where there is endemic fluorosis. But Perrott is wrong. The study was in Mexico City where there is no endemic fluorosis. Furthermore, the women’s fluoride exposures during pregnancy were in the same range as found in countries with artificial fluoridation such as New Zealand.

The study reports that for every 0.5 mg/L increase of fluoride in the urine of the mothers there was a statistically significant decrease in average IQ of the children of about 3 IQ points. It is therefore correct to say that a fluoride level in urine of 1 mg/L could result in a loss of 5 – 6 IQ points. This is particularly relevant to the New Zealand situation where fluoridation is carried out at 0.7 mg/L to 1 mg/L and fluoride urine levels have been found to be in this range2.

There is no excuse for Health Minister, David Clark, to continue to bury his head in the sand. This level of science demands that the precautionary principle be invoked and fluoridation suspended immediately.

Similar articles




Anti-fluoride authors indulge in data manipulation and statistical porkies

Darrell Huff & Irving Geis wrote a classic book – How to Lie With Statistics. They outline various ways data can be presented to give the wrong story. However, there is an even more naive use of statistics to misrepresent data – just declare that a relationship is statistically significant, don’t show any data or statistical analysis.

Unfortunately, many people are fooled by the use of those magical words – “statistically significant.”

I suppose the lay person could be excused – although it would pay even them to be a bit more sceptical about such claims. But it seems that even some “scientific” journals, or perhaps inadequate peer reviewers, can be fooled by those magical words. Here is an example in the paper by Hirzy et al., (2015) in the journal Fluoride. (Yes, I know, this journal is well known for its anti-fluoride stance and poor scientific quality but I would have thought the editor, Bruce Spittle,  would have picked this one up – even if they do not have an adequate peer review system. Perhaps the fact Spittle is one of the authors of the paper is a factor).

I critiqued the paper in my article Debunking a “classic” fluoride-IQ paper by leading anti-fluoride propagandists and have submitted a more formal critique to the journal (see – Critique of a risk analysis aimed at establishing a safe dose of fluoride for children.) But here I just want to deal with those magical words used in the paper – “statistically significant.”

Hirzy et al (2016) rely completely on data reported by Xiang et al., (2003) and claim they “found a statistically significant negative relationship between . . . .  drinking water fluoride levels and IQ.” Trouble is – you can search through the data presented by Xiang et al., (2003) and there is absolutely nothing to indicate a “statistically significant” relationship. Sure, that paper actually claims “This study found a significant inverse concentration-response relationship between the fluoride level in drinking water and the IQ of children.”  But there is no table or graphic presenting the individual data points and no statistical analysis for drinking water F and IQ. Rather surprising because Xiang et al., (2003) did present the individual data points for urinary fluoride and did present some results for statistical analysis of other relationships.

The trick behind the misleading use of Xiang’s data

However, what Xiang et al (2003) did do was separate their drinking water fluoride and IQ data into different ranges. This is a table of their result.

While group F was data for one village (Xinhuai) and the data in the other groups were for a separate village (Wamiao), there was no explanation of the criteria used for the groups – and the numbers in each group very tremendously. Over half the children (290 of the total 512) were in Group F and the size of the other groups seem to arbitrarily vary between 8 and 111.

This manipulation produces data which can be used to imply a statistically significant relationship. Do the statistical analysis for water F and IQ in the above table and sure you get a lovely straight line, a correlation of 0.96 and very significant statistically (p=0.003). But because of the manipulation, this says exactly nothing about the original data.

I will illustrate this by taking some data which Xiang et al (2003) did provide – for urinary fluoride and IQ. The data are illustrated in the figure below from the paper.

A statistical analysis of that data did show it was statistically significant – Xiang et al. (2003) cite a “Pearson correlation coefficient –0.174 , p = 0.003.” Now, that explains about 3% of the variance in IQ and I would have liked to see a similar analysis for water F as other workers have usually found weaker relationships for water F than for urinary F.

But let’s try using the manipulation of Xiang et al (2o03) and Hirzy et al (2016) to make the relationship between urinary F and IQ look a lot better than it is. I used a software tool to extract data from the figure – it didn’t extract all the points (264 out of a total 290) because of overlaps but statistical analysis of my extracted data gave a Pearson correlation coefficient of 0.16, p=0.010. Very similar to that reported by Xiang et al., (2003).

The tricky manipulations

I have absolutely no idea why Xiang et al., (2003) used different group sizes – so, to be fair, I have divided my extracted data into 6 groups of 44 pairs each (after sorting them into order based on urinary F) to produce the following table.

Urinary F IQ
A 1.79 105.57
B 2.30 89.45
C 2.30 77.72
D 2.69 68.58
E 2.48 56.25
F 2.69 40.10

This produces a lovely graph:

But, just a minute, I can get a better graph if I sort  the data according to IQ instead of urinary F:

But why stop there. If I choose different group sizes – remember Xiang et al., (2003) had groups ranging from 8 to 250 in size – I am sure I can get an even better presentation of the data.

TBut these graphs look far better than the one presented in Xiang et al (2003) for urinary F. We have taken data where the urinary F data explains only about 3% of the variance in IQ and produced graphics implying it “explains” up to about 75% of the variance. And we could “explain” more with a bit of extrra manipulation.


Data manipulation like this doesn’t change the fact that while the relationship between urinary F and IQ is statistically significant it only explains about 3% of the variance in IQ. This means that other factors, or confounders, should be considered – and when they are it is likely the significant relationship of IQ to urinary F would disappear.

Although Xiang et al., (2003) did not provide any statistical analysis to support their claim there was a significant relationship between water F and IQ I am sure the relationship is similar to that for urinary F – maybe even worse. Manipulating the data by using a range of groups of different sizes has certainly made the data look a lot better – but it is completely misleading.

I think it shocking that the authors of the Hirzy et al., (2016) paper have used manipulated data in this way – first to claim that fluoride in drinking water has a major negative effect on IQ and secondly to use such massaged data to work out a “safe dose.”

Worse, the journal Fluoride, and its peer reviewers, should never have accepted this paper without querying the claim of a significant relationship between drinking water F and IQ.

Similar articles


The chances of Royal Weddings arising randomly…

Royal Wedding - no chance - "not in a billion universes."

Now, I am not a Royalist. The whole subject bores me and I profess ignorance about the intricacies and origins of the institution. But I couldn’t help notice there was a Royal Wedding recently.  And having just read the blog post The chances of life arising randomly…., by New Zealand’s leading village creationist Ian Wishart, I thought I would apply his reasoning to help me understand the event.

You see. Something really strange happened. Hundreds of people spontaneously appeared at Westminster Abbey. Not only that. Half of them were adorned with strange contraptions on their heads and the other half dressed like penguins. But the coincidences extended even further. At the same time hundreds of thousands of people congregated in surrounding streets in London. But that’s not all. The coincidences were even more amazing. It’s estimated 2 billion (2,000,000) people watched the events on TV.

Continue reading

Treating statistics sensibly

People love to quote statistical studies to support their claims. And often its just a matter of confirmation bias. The statistical studies may not provide the support required – or may suffer from all sorts of flaws.

We see a lot of this in discussions on health, diet and life style. But I have also noticed statistics being liberally thrown around when religion and religious attitudes are discussed. If there is any area ripe for confirmation bias this is certainly it.

Consider this little graphic below which appear at a dating site OKCupid (see The REAL ‘Stuff White People Like’). Just imagine what negative conclusions one could draw about religion from that. To be fair, most references I have seen to it (all atheist – strangely, no religious sites are quoting it) do advise taking it with a grain of salt. (If you are interested have at look at the source. It provides other statistics from the study which will help make sense of this graph).

On the other hand, I have had statistical studies quoted at me which claim to “prove’ the religious people are happier, more honest, more moral, etc. Typical those quoting the studies have never bothered to check out the details and always ignore studies which might have provided different conclusions. In other words the normal confirmation bias.

Continue reading

New Zealand’s climate change deniers’ distortions exposed.

I am pleased to see that the attempt to promote a New Zealand version of “climategate” has more or less foundered. Sure the ACT party and some more extreme opponents of the findings of climate scientists are still campaigning (see for example Auckland Public Meeting: Climategate, NIWA and the ETS). And well know local climate change denier Ian Wishart managed to get international reporting of his slanderous press release (BREAKING: NZ’s NIWA accused of CRU-style temperature faking) in several more conservative and extreme international blogs and papers (for example BREAKING: NZ’s NIWA Accused of CRU-Style Temperature Faking, Climategate Scandal Spreads to New Zealand as MSM Continues Ostrich Act, Oops! Now New Zealand NIWA Accused Of Faking Data, New Zealand’s NIWA Gets Busted “Tricking” Their Climate Data and New Zealand Climate Data Shows Clear Evidence Of Fraud). But the New Zealand media has, in general, been more balanced in its reporting. The information from climate scientists at NIWA has been getting through.

Continue reading

Ranking NZ blogs with sitemeter data

There’s a school of thought that the only reliable way to rank blogs is by using actual sitemeter data for visits. Unfortunately, most blogs don’t make this information public. Perhaps if more did bloggers could compare their statistics with those for other sites or have a listed ranking. This would help their interpretation.

I have managed to identify 37 New Zealand blogs with public sitemeter stats, or who self report this information – mostly from Tumeke’s blog ranking reports (see nz blogosphere rankings : March 2009). So how do these sites rank amongst themselves? Have a look at the table below for the April ’09 data. And for those interested in comparing sitemeter data with the traditional blog ranking indices, and with the rankings of Tumeke, Halfdone and Open Parachute – look at my analysis below the table.

Update: Homepaddock has passed on (see Naked Stats) that her stats are also public so we now have 38 blogs on the list.

Continue reading

Ranking methods for NZ blogs

I had a go at ranking New Zealand blogs at the end of February (see Rating NZ blogs). A month later I think I will try again. But first – let’s check out the ranking method I am using.

This is similar to that used in the Atheist Blog Ranking which ranks each available statistics and then aggregates these ranks to produce an over all rank. Hopefully the procedure evens out the quirks inevitable in ranking using individual statistical measures.

Continue reading