Wendi Malley vs GWO: who is correct?


How long do you have to run a test to consider it a tie?

You could consider this to be part II of a series, where part I is a post by statistician Wendi Malley. She writes about how many pageviews I need in my sample size before I can call my Google Website Optimizer (GWO) test a tie. You should bear with me even if statistics aren’t your thing, because by the end of this post, I put it in plain English. (OK, en-us.)

If you didn’t read her post, she looked at my GWO tests, which were all running neck and neck for two+ weeks, with a conversion rate of greater than 4% for the control (and the other ones, too). From there, she figured out that I can call it a day (i.e. they are a tie), when I have 1728 pageviews. I only had 783 views of the test page when I sent her the data.

Her answer assumes that I am looking for 95% confidence in my answer and a margin of error of plus or minus 1%. Since it only took 2+ weeks to get 783 views, I figured I only needed another 2+ weeks to go.

But at the same time that I wrote Wendi, I also wrote GWO. On the surface, their answer seemed to be very different from hers:

Given enough time, every test (assuming there are perceptible differences in the variations) will generate a winner in the report. This is because with enough data, even the smallest differences will be discernible. The question is, are those differences worth waiting for? At this point, there aren’t many conversions in your experiment. Because of the low traffic and low conversion rate, you may have to wait for months to get something more definitive.

Hmm, those two things didn’t seem to go together. So I pushed a little harder, and as usual, the GWO people were very responsive, and they came back with this answer:

What Wendi is describing in her blog is a power calculation. This
says: if I want to be able to measure a difference of a given size
(delta), if I wait so long (n), I will be 95% (alpha) certain that I
can see the difference…

My original statement is also correct: If you wait long enough, a
difference of any magnitude will be measurable. What Wendi shows is
that you qualify that statement with an amount of difference one is
interested in, you can calculate the number of impressions required to
detect that difference with a given degree of certainty.

So I pushed through the Greek letters (and wrote Wendi) in an effort to really understand her equation. Here is what it means in English – no Greek letters or subscripts (and Wendi, you correct me if I am wrong):

Given a conversion rate we already know (the control) and a confidence that we want (95%), how many views of the test page do we need to have in order to feel that the conversion rate of the other tests will be no more than plus or minus 1%? In the case of my test, how many pageviews do we need to see to feel 95% confident that the conversion rates of the other tests will be between 3.72 and 5.72%? (after all the control has 4.72%, so that’s plus or minus one percentage point, right?)

And in fact, GWO is right also – they *are* both right. We can decrease that “margin of error” (I wish we could call it “conversion rate difference”) to be .0001% and we will need over 17 million page views to have 95% confidence that there is a tie. Of course, I owe that calculation to Wendi’s spreadsheet.

And finally — look! I am starting to see a little spread in the data:


Our founder, Robbin Steif, started LunaMetrics in 2004. She is a graduate of Harvard College and the Harvard Business School, and has served on the Board of Directors for the Digital Analytics Association. Robbin is a winner of a BusinessWomen First award, as well as a Diamond Award for business leadership. In 2017, Robbin sold her company to HS2 Solutions and has since retired from LunaMetrics.

  • Hey Robbin! You’ve got it! I can see GWO’s point but I think in your case statistics in on your side. Knowing how long you need to wait makes your life so much easier. Of course if you were dealing with a site that was able to generate 10K page views in a day then you don’t really need statistics anymore… you’ve got volume on your side.

    The best applied statistical strategies help a researcher reduce costs or help those those with limited testing resources to make informed decisions quickly without waiting for that 10,000th page view to come 3 months later. Statistics is not for every situation but I do believe that it works well in your case that you are discussing today.


Contact Us.

Follow Us




We'll get back to you
in ONE business day.
Our Locations
THE FOUNDRY [map] LunaMetrics

24 S. 18th Street
Suite 100

Pittsburgh, PA 15203


4115 N. Ravenswood
Suite 101
Chicago, IL 60613


2100 Manchester Rd.
Building C, Suite 1750
Wheaton, IL 60187