Richard Nisbett is a professor of psychology and co-director of the Culture and Cognition Program at the University of Michigan.
Edge has an interesting talk about the problems of research relying on regression analyses (see The Crusade Against Multiple Regression Analysis). Unfortunately, it has some important faults – not the least is his use of the term “multiple regression” when he was really complaining about simple regression analysis.
Professor Richard Nisbett quite rightly points out that many studies using regression analysis are worthless, even misleading – even, as he suggests, “quite damaging.” Damaging because these studies gets reported in the popular media and their faulty conclusions are “taken as gospel” by many readers. Nisbet says:
“I hope that in the future, if I’m successful in communicating with people about this, there’ll be a kind of upfront warning in New York Times articles: These data are based on multiple regression analysis. This would be a sign that you probably shouldn’t read the article because you’re quite likely to get non-information or misinformation.
Knowing that the technique is terribly flawed and asking yourself—which you shouldn’t have to do because you ought to be told by the journalist what generated these data—if the study is subject to self-selection effects or confounded variable effects, and if it is, you should probably ignore them. What I most want to do is blow the whistle on this and stop scientists from doing this kind of thing. As I say, many of the very best social psychologists don’t understand this point.
I want to do an article that will describe, similar to the way I have done now, what the problem is. I’m going to work with a statistician who can do all the formal stuff, and hopefully we’ll be published in some outlet that will reach scientists in all fields and also act as a kind of “buyer beware” for the general reader, so they understand when a technique is deeply flawed and can be alert to the possibility that the study they’re reading has the self-selection or confounded-variable problems that are characteristic of multiple regression.”
I really hope he does work with a statistician who can explain to him the mistakes he is making. The fact that he raises the issue of “confounded-variable problems” shows he is really talking about simple regression analysis. This problem can be reduced by increasing the types and numbers of comparisons performed in an analysis – by the use of multiple regression analysis, the very thing he makes central to his attack!
The self-selection problem
Nisbett gives a couple of examples of the self-selection problem:
“A while back, I read a government report in The New York Times on the safety of automobiles. The measure that they used was the deaths per million drivers of each of these autos. It turns out that, for example, there are enormously more deaths per million drivers who drive Ford F150 pickups than for people who drive Volvo station wagons. Most people’s reaction, and certainly my initial reaction to it was, “Well, it sort of figures—everybody knows that Volvos are safe.”
Let’s describe two people and you tell me who you think is more likely to be driving the Volvo and who is more likely to be driving the pickup: a suburban matron in the New York area and a twenty-five-year-old cowboy in Oklahoma. It’s obvious that people are not assigned their cars. We don’t say, “Billy, you’ll be driving a powder blue Volvo station wagon.” Because of this self-selection problem, you simply can’t interpret data like that. You know virtually nothing about the relative safety of cars based on that study.
I saw in The New York Times recently an article by a respected writer reporting that people who have elaborate weddings tend to have marriages that last longer. How would that be? Maybe it’s just all the darned expense and bother—you don’t want to get divorced. It’s a cognitive dissonance thing.
Let’s think about who makes elaborate plans for expensive weddings: people who are better off financially, which is by itself a good prognosis for marriage; people who are more educated, also a better prognosis; people who are richer; people who are older—the later you get married, the more likelihood that the marriage will last, and so on.”
You get the idea. But how many academic studies rely on regression analysis of data from a self-selected sample of people? The favourite groups for many studies are psychology undergraduates at universities!
Confounded variable problem
I have, in past articles, discussed some examples of this related to fluoride and community water fluoridation.
- Peckham’s regression analysis relating drinking water fluoride to hypothyroidism – which ignored the confounder iodine known to be directly related to hypothyroidism (see Paper claiming water fluoridation linked to hypothyroidism slammed by experts and Anti-fluoride hypothyroidism paper slammed yet again).
- Malin and Till who related community water fluoridation to the prevalence of ADHD but ignored social and geographic factors which have more influence. In fact, multiple regression including some of these factors showed no remaining significant influence of fluoridation (see ADHD linked to elevation not fluoridation, ADHD link to fluoridation claim undermined again and Poor peer review – and its consequences).
- Xiang’s negative correlation of IQ with urinary fluoride. But the relationship explained only 3% of the variation in IQ and a multiple regression including confounders like parental education, socio-economic status, and physical and dental deformities (the study was done in an area of endemic fluorosis) would probably have shown urinary fluoride not to have a significant influence on IQ (see Connett fiddles the data on fluoride, Connett & Hirzy do a shonky risk assesment for fluoride and Connett misrepresents the fluoride and IQ data yet again).
Simple regression analyses are too prone to confirmation bias and Nisbett should have chosen his words more carefully, and wisely. Multiple regression is not a silver bullet – but it is far better than a simple correlation analysis. Replication and proper peer review at all research and publication stages also helps. And we should always be aware of these and other limitation in exploratory statistical analysis. Ideally, use of such analyses should be limited to a guide for future, more controlled, studies.
Unfortunately, simple correlation studies are widespread and reporters seem to see them as easy studies for their mainstream media articles. This is dangerous because it has more influence on readers, and their actions, than such limited studies really warrant. And in the psychological and health fields there are ideologically motivated groups who will promote such poor quality studies because it fits their own agenda.