haggholm: (Default)
[personal profile] haggholm

Autism is a pretty mysterious condition. No one really knows what causes it (all we really know for sure, after all this testing, is that whatever else it does, the MMR vaccine definitely doesn’t cause it…), but it’s thought to be part genetic, part environmental. A Swedish study on indoor air pollutants has now suggested that, although the data are very tentative, vinyl flooring may increase the risk of autism!

The researchers found four environmental factors associated with autism: vinyl flooring, the mother's smoking, family economic problems and condensation on windows, which indicates poor ventilation.

Infants or toddlers who lived in bedrooms with vinyl, or PVC, floors were twice as likely to have autism five years later, in 2005, than those with wood or linoleum flooring.

Whether the link is real is, as the researchers very frankly point out, as yet unknown, and only further studies can reveal it. I find this interesting to consider, however, as a case study in how easy it is to get the wrong impression from results like these. There’s a number of interesting traps to fall into.

  1. It’s fairly likely that someone will report on this, or already has, under a headline like Research finds link between vinyl flooring and autism, giving the impression that it’s clear-cut, whereas the single most clear-cut message of this study is that it ain’t so.

  2. Correlation does not imply causation, and even when there’s causation, we have to make sure we get it the right way around. As one commenter to that article pointed out, autistic children tend to be extremely preoccupied with textures. Even if there’s a direct link between vinyl flooring and autism, that doesn’t mean that the former causes the latter. Maybe families with autistic children prefer vinyl flooring because it makes the children happier, and so in a sense, autism might cause vinyl flooring!

  3. Notice that they found four, that’s four environmental factors associated with autism: vinyl flooring, the mother's smoking, family economic problems and condensation on windows. However, these variables were not controlled for, and may not be independent.

    What does this mean? Well, it may be that any or all of these variables are connected: Maybe poorer people are more likely to smoke, less likely to afford good ventilation, and less likely to afford nice hardwood floors. If any of these things really does increase the risk of autism, the other variables will be associated with it: If, say, the mother’s smoking causes autism, and more poor mothers than wealthy mothers smoke, then vinyl floors and everything else associated with poor people shows up as associated with autism in the statistics. But while the correlation is there and is real, there is (in my example) no causative relationship at all.

    This sort of thing is always a problem with any studies, especially (I believe) when randomisation is poor or sample sizes are small. These are four known and named variables that may reasonably be correlated. What would we have thought of this article if they hadn’t mentioned smoking, wealth, or ventilation? It would have painted a very different picture. And it’s not necessarily dishonesty or editorial brevity that leaves variables out of the equation: Sometimes relevant data just aren’t measured—what if the study hadn’t asked about wealth or smoking?

    I’m reminded of the very poorly thought-out article I read a little while back that claimed that light pollution at night from all the street lights and so forth lead to—I don’t recall: Some health problem or other. However, light pollution goes with industrialisation, and the number of variables you introduce when you compare a more industrial to a more agricultural country is ridiculously large. The article made no mention of those at all, but spoke as though there had to be a direct causal link from light pollution to the health issue at hand (which is why I consider it such a poor article).

  4. The study was not designed to look for these data, which means that we must suspect data mining. Data mining refers to digging through a set of data looking for any relationships, whether the ones originally examined or not. The problem is that some relationship will always be found.

    Suppose, for instance, that a study is in some global sense 99% reliable. What does this mean? It means that we set out to discover whether X causes Y, and if the study says yes, we can be 99% certain that we’re right. On the flip side, there’s a 1% chance that we’re wrong. Now suppose that, since we have all these statistics anyway, we decide to check of X causes Z, or A causes B…and so on. For every single one of these, we may (very generously) be 99% certain that it’s correct, but if we look for 100 different relationships, we know that we’ve probably got at least one wrong!

    In fact, we’re 73% likely to have got at least one wrong, and that’s with a 99% confidence level and the very generous assumption that the data are as reliable in unknown areas. In reality, I expect that will often not be the case: Even if I design my study to control for a lot of variables surrounding the hypothesis I set out to explore, I can’t possibly do the same for a bunch of hypotheses someone constructs from my data after the fact.

    This is why data mining is frowned upon in scientific studies. We can look at data like that and find correlations that intrigue us, and use those correlations to inspire new studies—just as this Swedish study means that it might not be a bad idea to look at possible connections between vinyl flooring (and phthalates) and autism…but we shouldn’t be fooled into thinking that they necessarily mean anything, because we know that if we look hard enough at any set of statistics, we will be able to find some spurious connections.

Profile

haggholm: (Default)
Petter Häggholm

July 2025

S M T W T F S
  12 345
6789101112
13141516171819
20212223242526
2728293031  

Most Popular Tags