# Testing: How does the Website Optimizer calculator work?

/

Don’t you ever wonder about the computations of that little calculator that Google gives you to figure out the length of a multivariate test?

I don’t have any insider knowledge. But I have studied it enough to understand certain issues (and many thanks to Dylan Lewis of the Web Analytics Wiki for confirming my suspicions wrt how it should work.) Specifically, you should need more data to “prove the same thing” if your control has a higher conversion rate, up to a conversion rate of 50%.

So let’s start: why does the GWO calculator ask you to input the conversion rate of your current page? Well, here’s why they care. If you hold everything else the same and tell the calculator that your current conversion rate is 4% instead of 3%, it will want a larger sample (translation: more pageviews, or more time to get those more pageviews) in order to get the statistical significance it needs.

So look at these two examples. All the variables are the same (sort of — I promise I will explain.) However, in the examples below, one conversion rate was 3% and the other is 4%. Notice, also (here is the explanation just promised) that I changed the expected increase in conversion rate. With the 4% test, I have it expected to increase by 25% (so that I will get a one point lift in my conversion — after all, .25*4=1) And with the 3% test, it’s expected to increase by 33.33% (because 3% times .33333 is also a one point lift):

So when the current conversion rate is higher, and you are looking for the same absolute expected improvement, the test takes longer, so that you can get more pageviews – i.e. get a greater sample size.

Why?

Why do we need more data to prove improvement with a highly-converting page than with a poorly converting page? Here, I will use a more extreme example: an absolute increase of one point is pretty low when you are looking at a page that converts at 25%. So we need lots of data to prove that a test will do better than the 25%. But a one point increase — that a whopping increase if your control page converts at 4% right now. So we can prove that our new test is better than our old control with just a little bit of data in that situation.

Here’s the really interesting part: when your control has a conversion rate of 50%, you need the most pageviews, i.e. time to get those pageviews. As you keep going beyond 50%, the time to run your tests starts to decrease. When you get to a conversion rate of 75% for your control, the time it takes for the test should mirror the time it takes at a control conversion rate of 25%. (It’s not perfectly exact for mathematical reasons that are too boring to go into here.) But check it out:

(notice that 25* 10% is a 2.5 point lift, and 75* 3.33333 is a 2.5 lift in conversion rate, also.)

Why?

Why does it all turn around at 50%? And I want to try to explain this without using ps and qs and little hats, since I’m not a statistician. So I won’t use fancy equations. Just simple ones.

All these equations that are behind all these kinds of calculators, they include two events: heads or tails. Conversions or non-conversions. They never say (to the extent that they talk), “Conversion is good.” Only people think that conversion is good and non-conversion is bad. (Those equations also include other stuff, but we don’t have to go there.) In fact, you have to have five conversions and five non-conversions for a combination to show up in the graphical area of the website optimizer (the area where the bars are green and grey and red.)

So when you start playing with conversion times non-conversion, you find out that they multiply out to the largest amount when they are both 50%. Right? .5*.5= .25 but if you now use a little 2% conversion rate instead, you have .02*.98 = .0196. That’s way lower than .25 (and remember — this is not sample size, but is one of the important parts of the sample size equation.)

My fourth grade teacher, Mrs. Petrowski, insisted that I learn all those math laws, and one of them was about “commutativity” — it doesn’t matter what the order is in multiplication, you still get the same answer, she lectured. So we can swap those numbers and say that the conversion rate is 98%, leaving the non-conversion rate to be 2%, and the product is still .0196.

So whether you have a 98% conversion rate or a 2% conversion rate — your sample size is going to be the same. (Remember that there is a lot of other junk that goes into the equations, but this basic principle should hold, even though I don’t have access to the innards of the calculator.) And from all this gobbledygook we learn:

• To prove that a test is 1% better than the control, you need more pageviews if the control has a high conversion rate than you would if the control had a low conversion rate.
• However, once the control has a conversion rate over 50%, you start needing fewer pageviews.
• This is a hard topic. If you didn’t understand, please comment and I will do my best.

Whew. This post took me at least two months to write. Many thanks toDylan, again; to Wendi Malley; to Tom Leung (whom I have driven crazy on this topic); and to EV, the GWO engineer who must be sorry he ever gave me his email address.

Everyone who thinks that change in conversion rate should be viewed as a PERCENT and not as an absolute lift in conversion is welcome to flame in the comments.

Robbin

Our founder, Robbin Steif, started LunaMetrics in 2004. She is a graduate of Harvard College and the Harvard Business School, and has served on the Board of Directors for the Digital Analytics Association. Robbin is a winner of a BusinessWomen First award, as well as a Diamond Award for business leadership. In 2017, Robbin sold her company to HS2 Solutions and has since retired from LunaMetrics.

• Robbin,

Thanks for pointing this calc out, its a neat
little gizmo.

Upon ripping it apart it seems like it is based
upon the formula:

Duration (in days) = i * (test combinations) /[ (%visitors in experiment) * (pagviews per day)]

In the examples you showed, test combinations, page views per day
and % of visitors in experiment were all fixed.

That leaves the parameter (i) which they are deriving from
a function called impressions_to_divergence(p1, impr1, p2, impr2, conf).

It seems like an iterative algorithm which has as some of
its inputs the current conversion rate and the expected improvement
percentage (along with a parameter “conf”).

What I’m questioning is the parameter “conf” in the
impressions_to_divergence function. It seems that they’ve hard-coded
it at .28?

Any thoughts what it might be?

Using your example of 12 test combinations, 100 page views per day,
100 % Visitor participation, 25% current conversion, and 10% expected
improvement; I modified the code for the value of “conf” and obtained
these results for the duration (in days).

conf = .1 -> days = 1.44
conf = .2 -> days = 5.88
conf = .28 -> days = 11.64
conf = .3 -> days = 13.32
conf = .4 -> days = 23.76

Interesting…I’m wondering why they’ve chosen to fix “conf”
at .28? I see how it is passed down to the function that computes
the variance and is used to close in on the value…

Guess I’m off to dig out a stat book. Or maybe I should take it for
granted that the car runs, and not worry about the inner-workings
of the internal combustion engine?

Fun stuff…

Thanks,
Jim

• I messed up someone’s comment by accident. His name is Emerson Hartley, and he told me that he is not from Memetrics, but likes them a lot. Here is what he wrote:

“Having used Google as well as xOs from Memetrics, I was very satisfied with the xOs Sample Size calculator. It has easy to use terms where a number of days to run the test is outputted. It’s five easy questions that you plug in and press calculate.

“The biggest difference is that Google optimizes one landing page for everyone while with a premium solution like xOs, you can optimize for different visitor segments, observed or pre-defined.”

• Paul

Robbin,

The calculator is taking the new conversion rate into account as well. I can’t tell the specific formula they’re using, but it probably includes old variance + new variance (variance = conversions * non-conversions). So for the 25% example this would be .25 * .75 + .275 * .725 = .389 and for 75% it would be .75 * .25 + .775 * .225 = .362.

That’s why the sample size estimation is close but different for these opposites.

• The reason they are different (and I am having a hard time following yours, Paul, maybe I just don’t get enough sleep) is that they are not perfect mirrors. If I could compute a 10% expected IMPROVEMENT on a 25% currently observed conversion rate and a 3.3333% DECREASE on a .75% currently observed conversion rate, they should be the same thing. Because when I do it that way, they are perfect mirrors. However, the GA calculator does not allow for a decrease in conversion rate.

But we can hack it. So look — 3.3333% of 75% is 2.5 points. So instead of increasing the 75%, let’s take it down by 2.5 points to 72.5. To get it back up to 75%, we need to do an increase of 2.5 divided by 72.5 = 3.4483

Now, let’s go back to the calculator. Put in 72.5% as the current conversion rate. put in 3.4483 as the intended increase. It spits back 11.64 days. The same as a 25% control rate and a 10% expected increase.

Perfect mirrors.

• Michael

FYI, the google optimizer calculator seems to have changed their duration algorithm. For example, the first calculation in this post cites a duration of 12.36 days, when the new calculator result is 684.36 days. Quite a difference.

• You are right!! I knew about the change, and forgot to deal with the post. Your comment is very helpful, thanks so much!

• Very good article you have here. I’ve been learning this stuff for weeks. It help me a lot.
Thanks..

• In this great scheme of things you secure an A+ with regard to hard work. Where exactly you lost me was on the facts. You know, they say, the devil is in the details… And it couldn’t be more true in this article. Having said that, let me tell you what exactly did do the job. The authoring is incredibly powerful and this is most likely the reason why I am taking the effort in order to comment. I do not make it a regular habit of doing that. Next, even though I can notice a jumps in logic you make, I am not necessarily certain of exactly how you appear to unite your ideas which in turn produce the actual final result. For right now I will, no doubt yield to your issue but wish in the foreseeable future you actually connect the facts better.

• fiber rich foods

I like the helpful information you supply on your articles.
I’ll bookmark your weblog and check once more here frequently. I am slightly certain I’ll learn plenty of new
stuff proper right here! Best of luck for the following!

1.877.220.LUNA

1.412.381.5500

getinfo@lunametrics.com

Questions?
We'll get back to you
in ONE business day.
Our Locations
THE FOUNDRY [map]

24 S. 18th Street
Suite 100

Pittsburgh, PA 15203

THE STUDIO [map]

4115 N. Ravenswood
Suite 101
Chicago, IL 60613

THE LODGE [map]

2100 Manchester Rd.
Building C, Suite 1750
Wheaton, IL 60187