412.343.3692
1.800.975.1844

Archive for the ‘Statistics’ Category

Testing: How does the Website Optimizer calculator work?

Sunday, September 16th, 2007

Don’t you ever wonder about the computations of that little calculator that Google gives you to figure out the length of a multivariate test?

I don’t have any insider knowledge. But I have studied it enough to understand certain issues (and many thanks to Dylan Lewis of the Web Analytics Wiki for confirming my suspicions wrt how it should work.) Specifically, you should need more data to “prove the same thing” if your control has a higher conversion rate, up to a conversion rate of 50%.

So let’s start: why does the GWO calculator ask you to input the conversion rate of your current page? Well, here’s why they care. If you hold everything else the same and tell the calculator that your current conversion rate is 4% instead of 3%, it will want a larger sample (translation: more pageviews, or more time to get those more pageviews) in order to get the statistical significance it needs.

So look at these two examples. All the variables are the same (sort of — I promise I will explain.) However, in the examples below, one conversion rate was 3% and the other is 4%. Notice, also (here is the explanation just promised) that I changed the expected increase in conversion rate. With the 4% test, I have it expected to increase by 25% (so that I will get a one point lift in my conversion — after all, .25*4=1) And with the 3% test, it’s expected to increase by 33.33% (because 3% times .33333 is also a one point lift):gwo-calculator-3.jpg gwo-calculator-4.jpg

So when the current conversion rate is higher, and you are looking for the same absolute expected improvement, the test takes longer, so that you can get more pageviews - i.e. get a greater sample size.

Why?

Why do we need more data to prove improvement with a highly-converting page than with a poorly converting page? Here, I will use a more extreme example: an absolute increase of one point is pretty low when you are looking at a page that converts at 25%. So we need lots of data to prove that a test will do better than the 25%. But a one point increase — that a whopping increase if your control page converts at 4% right now. So we can prove that our new test is better than our old control with just a little bit of data in that situation.

Here’s the really interesting part: when your control has a conversion rate of 50%, you need the most pageviews, i.e. time to get those pageviews. As you keep going beyond 50%, the time to run your tests starts to decrease. When you get to a conversion rate of 75% for your control, the time it takes for the test should mirror the time it takes at a control conversion rate of 25%. (It’s not perfectly exact for mathematical reasons that are too boring to go into here.) But check it out:

gwo-calculator-25.jpggwo-calculator-75.jpg

(notice that 25* 10% is a 2.5 point lift, and 75* 3.33333 is a 2.5 lift in conversion rate, also.)

Why?

Why does it all turn around at 50%? And I want to try to explain this without using ps and qs and little hats, since I’m not a statistician. So I won’t use fancy equations. Just simple ones.

All these equations that are behind all these kinds of calculators, they include two events: heads or tails. Conversions or non-conversions. They never say (to the extent that they talk), “Conversion is good.” Only people think that conversion is good and non-conversion is bad. (Those equations also include other stuff, but we don’t have to go there.) In fact, you have to have five conversions and five non-conversions for a combination to show up in the graphical area of the website optimizer (the area where the bars are green and grey and red.)

So when you start playing with conversion times non-conversion, you find out that they multiply out to the largest amount when they are both 50%. Right? .5*.5= .25 but if you now use a little 2% conversion rate instead, you have .02*.98 = .0196. That’s way lower than .25 (and remember — this is not sample size, but is one of the important parts of the sample size equation.)

My fourth grade teacher, Mrs. Petrowski, insisted that I learn all those math laws, and one of them was about “commutativity” — it doesn’t matter what the order is in multiplication, you still get the same answer, she lectured. So we can swap those numbers and say that the conversion rate is 98%, leaving the non-conversion rate to be 2%, and the product is still .0196.

So whether you have a 98% conversion rate or a 2% conversion rate — your sample size is going to be the same. (Remember that there is a lot of other junk that goes into the equations, but this basic principle should hold, even though I don’t have access to the innards of the calculator.) And from all this gobbledygook we learn:

  • To prove that a test is 1% better than the control, you need more pageviews if the control has a high conversion rate than you would if the control had a low conversion rate.
  • However, once the control has a conversion rate over 50%, you start needing fewer pageviews.
  • This is a hard topic. If you didn’t understand, please comment and I will do my best.

Whew. This post took me at least two months to write. Many thanks toDylan, again; to Wendi Malley; to Tom Leung (whom I have driven crazy on this topic); and to EV, the GWO engineer who must be sorry he ever gave me his email address.

Everyone who thinks that change in conversion rate should be viewed as a PERCENT and not as an absolute lift in conversion is welcome to flame in the comments.

Robbin

Wendi Malley vs GWO: who is correct?

Tuesday, July 10th, 2007

How long do you have to run a test to consider it a tie?

You could consider this to be part II of a series, where part I is a post by statistician Wendi Malley. She writes about how many pageviews I need in my sample size before I can call my Google Website Optimizer (GWO) test a tie. You should bear with me even if statistics aren’t your thing, because by the end of this post, I put it in plain English. (OK, en-us.)

If you didn’t read her post, she looked at my GWO tests, which were all running neck and neck for two+ weeks, with a conversion rate of greater than 4% for the control (and the other ones, too). From there, she figured out that I can call it a day (i.e. they are a tie), when I have 1728 pageviews. I only had 783 views of the test page when I sent her the data.

Her answer assumes that I am looking for 95% confidence in my answer and a margin of error of plus or minus 1%. Since it only took 2+ weeks to get 783 views, I figured I only needed another 2+ weeks to go.

But at the same time that I wrote Wendi, I also wrote GWO. On the surface, their answer seemed to be very different from hers:

Given enough time, every test (assuming there are perceptible differences in the variations) will generate a winner in the report. This is because with enough data, even the smallest differences will be discernible. The question is, are those differences worth waiting for? At this point, there aren’t many conversions in your experiment. Because of the low traffic and low conversion rate, you may have to wait for months to get something more definitive.

Hmm, those two things didn’t seem to go together. So I pushed a little harder, and as usual, the GWO people were very responsive, and they came back with this answer:

What Wendi is describing in her blog is a power calculation. This
says: if I want to be able to measure a difference of a given size
(delta), if I wait so long (n), I will be 95% (alpha) certain that I
can see the difference…

My original statement is also correct: If you wait long enough, a
difference of any magnitude will be measurable. What Wendi shows is
that you qualify that statement with an amount of difference one is
interested in, you can calculate the number of impressions required to
detect that difference with a given degree of certainty.

So I pushed through the Greek letters (and wrote Wendi) in an effort to really understand her equation. Here is what it means in English - no Greek letters or subscripts (and Wendi, you correct me if I am wrong):

Given a conversion rate we already know (the control) and a confidence that we want (95%), how many views of the test page do we need to have in order to feel that the conversion rate of the other tests will be no more than plus or minus 1%? In the case of my test, how many pageviews do we need to see to feel 95% confident that the conversion rates of the other tests will be between 3.72 and 5.72%? (after all the control has 4.72%, so that’s plus or minus one percentage point, right?)

And in fact, GWO is right also - they *are* both right. We can decrease that “margin of error” (I wish we could call it “conversion rate difference”) to be .0001% and we will need over 17 million page views to have 95% confidence that there is a tie. Of course, I owe that calculation to Wendi’s spreadsheet.

And finally — look! I am starting to see a little spread in the data:

gwo-blog-shot.JPG

Statistics for Analysts - the course you keep meaning to take

Friday, October 13th, 2006

For all you analysts: should you want to brush up on your knowledge of statistics (or even start from scratch), here is a link to the Statistics Open Learning Initiative at Carnegie-Mellon University. It is completely on-line, self-paced, and free. I road tested most of the course about a year ago and really loved it. The professor who wrote it does a wonderful job of using real world examples. For example, he teaches box plots and distributions using temperature — he shows that the average high temperature here in Pittsburgh is the same as the average temperature in San Francisco, but oh! does the distribution vary.

I see now that you can do this course with either Excel or Minitab.

Robbin Steif
LunaMetrics

Web analytics and Regression: put a line through that data

Tuesday, January 10th, 2006

I do lots of key performance indicator and dashboard work. Much of it is longitudinal (this is stat geek speak for “over time”), and customers say, “I see the little dots, what is the trend?” If you have a big gorilla package like Sitecatalyst, you can ask to see the graphs smoothed. But we are not all so lucky, so smart or so rich as to have Omniture on our
side - so I finally figured out how to do it in Excel.

Although Microsoft keeps the ability to do linear regression, even with basic Office XP, pretty well-hidden, it is trivial once you figure it out.

First, create the data, or use mine:

57
68
48
52
78
79
58
80
83
98
67
64
93

Then, highlight the data in Excel and choose the graph wizard with the little chart wizard icon that is on the top of your screen or just choose Insert>Chart. The graph you want is Scatter, like the picture on the left, and you want the subtype that has the little dots only (it will probably already be highlighted, as it is in this screenshot.)

Click Next, Next, Finish (although you can add any options you like along the way, such as a title, legend, etc.)

Now comes the moment of truth. While your graph is highlighted, look at your Excel toolbar, and you should see a new menu, Graph, probably to the right of tools. Choose Chart>Add Trendline>Linear>OK.

Excel will will do the linear regression and add a line showing the trend of your datapoints. It should look something like this the screen shot on the left.

This is somewhat of a statistics hack. After all, we don’t know how well the data fits the line, which a package like MiniTab would tell us. But it’s free if you have Excel. You can do it tonight on your computer.

Robbin
LunaMetrics