412.343.3692
1.800.975.1844

Causation vs. correlation

A while back, I heard a report on the radio. Scientists had found four factors that were associated with breast cancer. One of them was a high-level of education. So does that mean that if I skip college and grad school, I am less likely to have breast cancer?

The correlation between education and breast cancer is just that — it is correlation, not causation. We find them together, but that doesn’t mean that one causes the other. In fact, there is at least one (and perhaps more) variables that are lurking in the background which is really causing the cancer. For example, highly-educated women may be less likely to have children when they are very young, and it is may be the child-bearing act that affects cancer.

I use breast cancer to illustrate lurking variables because it is so clear — we know that we can’t skip college and avoid the problem. Online, the issues of causation and correlation (and lurking variables) are just as important, but often not as clear.

For example, I usually see that visitors who spend a long time online (over half an hour) are much more likely to convert than those who spend only 15 minutes. My very first question is about correlation vs. causation. Does the length of time on the site actually cause the conversion (”Well, I’ve wasted the last half hour on this dumb site, let me just buy what I need and move on”)? If that were true, we would work as hard as we could to keep visitors on our site before they convert, because it would increase the chances of them converting. Or does the conversion cause the long time on the site — it’s hard to make a purchase, or the individual needs to learn a lot before he can push the “submit” button, so the interested visitor ends up spending a long time?

Web analytics show us what, and not “why.” However, about a year ago, I posed this question to Dr. Alan Montgomery, who is a professor of clickstream analysis at CMU. His answer? “It’s a little bit of both.”

Robbin Steif
LunaMetrics

Share and Enjoy:
  • Digg
  • del.icio.us
  • StumbleUpon
  • Sphinn
  • Facebook

2 Responses to “Causation vs. correlation”

  1. SiteSpect A/B Split and Multivariate Testing Says:

    This is a fruitful observation, and it’s true: traditional web analytics report on “what happened” historically. They enable us to segment traffic so we can measure correlations such as PPC traffic for keyword A converted at X%, vs. keyword B converted at Y%, etc.

    However, causality (vs. correlation) is not unattainable. In fact, A/B and multivariate testing do give us the ability to establish causality [note: you knew I was going down this road, didn't you? ;) ]

    Through A/B and multivariate testing, marketers/analysts gain control over their visitors’ experiences and can learn how site changes (e.g. a different layout, promotion or copy element) impact behavior. The capability to intentionally alter user experience and measure response lets us establish if certain content caused a change in behavior. This is the essence of why the things we learn through testing are so highly actionable.

    -Dave from SiteSpect

  2. Mr. R. Says:

    The problem is that in scientific world you can have controlled variable to trully test if A infuences B. In real world you can’t control the variable, so you have to “educately guess”

Leave a Reply