# Statistical Significance Chrome Extension for Google Analytics

**Note:**

*This article contains updates to the previous article “Statistical Significance Script for Google Analytics”, which has been redirected to this article. See the changelog for details.*In March I wrote a script for the statistical evaluation of time-frame comparisons in Google Analytics. The idea seemed well received, but who wants to have to hit F12, open their developer console, and then come back to my blog post for the code… every time you want to run the script?

So, I converted the script into a** Chrome Extension (click below)**!

This Chrome Extension also extends the original functionality of the script, by utilizing Student’s t-test for comparisons with more than 40 data points. Additionally, there were bugs in the first edition that limited the script’s application to certain metrics (E-commerce revenue, bounce rate, etc). This new version should work with all graphable metrics.

## Usage

As I suggested in my original post, it’s important to incorporate statistical significance into our interpretation of Google Analytics data. Barring at least some consideration of the statistics behind our data analyses, we risk identifying false positives as meaningful factors, and incorrectly allocating resources. If you need a quick brush-up on the basics and purpose of statistical testing, I recommend taking a look at Statistical Significance in a Testing World by Adam Sugano (check out his original article as well).

While my chrome extension aims to extend the strengths of Google Analytics (namely, responsive ad-hoc queries and visual comparison), analysts wishing to conduct a more-detailed statistical analysis should turn to the R Google Analytics project. If you have never used R, but you paid a bit of attention in your college stats class, and are familiar with a scripting language, you should be able to jump right in.

Matt Clarke at TechPad provides a great tutorial on connecting your Google Analytics account with R.

**Example use-cases**

Getting back to this Chrome extension, there are several examples of when we would make temporal (time-frame) comparisons of metrics in Google Analytics. In each of the following examples, we should be comparing engagement metrics before and after the event. We should also ensure that any percentage changes reported in Google Analytics are statistically significant.

- Website updates have been pushed
- Website has experienced change in exposure. Mention by a large news source, etc.
- New marketing campaigns
- Concerns that a search algorithm change is affecting traffic
- Concerns that a change in traffic demographics is affecting conversions

## Extensions / Feedback

Now the most important part of this post. What do you think about the approach to incorporate more statistical consideration into our Google Analytics analysis? Worthwhile? Waste of time?

And what do you think about the current state of the Chrome extension? What types of improvements would you most like to see? Some potential improvements might include:

- Support for comparison of advanced segments (pairwise / aggregate)
- More-detailed output (average, median, average difference, per-day)
- Support for exporting graph data points/calculated statistics as a CSV

## Changelog

Note: *n* is the number of data points (the number of hours, days, weeks, or months in your timeline). If you are looking at a full year’s worth of data at monthly points, *n* = 12.

- For 6 < n <= 40, Wilcoxon Signed-Rank test is used. The exact p-value is not returned, only tested at 0.1, 0.05, and 0.01 levels.
- For 40 < n < 58, Student’s Paired t-test is used. The exact p-value is returned.
- Updated to work with all graphable metrics
- Warns if start date is not the same weekday for the two time-frames to be compared
- Will not run if the time-frames are different lengths

## Limitations

- We make several assumptions about the data’s distribution in order to use the Wilcoxon Signed-Rank test and the Student’s Paired t-test. We do not estimate that these assumptions are unreasonable for most Google Analytics data.

### About Noah Haibach

Noah Haibach is a data analyst at LunaMetrics. Working with Google Analytics provides Noah with more data than his wildest dreams (or even nightmares sometimes), enabling him to pursue his passion of data mining and predictive analytics. Noah wasn't always working in the tech field. His background includes math and biology research. He also has an affinity for languages, racquet sports, and brewing.

Thanks for this Noah, looks great. I was wondering, though, what’s the rationale for using t-test instead of the z-test (according to my rather limited stats knowledge, the t distribution converges to the normal distribution when n = 30 or greater). Also, I thought t-test was normally used for small samples but you’re using the Wilcoxon Signed-Ranks test instead (have to admit I have to look that one up).

Thanks

Hey Carmen, these are good questions. And I’m glad you find this helpful! My general approach is to provide tests that err on the side of caution. The Wilcoxon Signed Rank test does not require the assumption of normality, and also does not lose much power compared to the t-test. So it is a safer choice at a smaller sample size.

You are correct regarding the t-test and z-test; the t-test approaches the z-test as sample size increases. The quickness with which it approaches the z-test depends on the normality of the sample population. We could replace the t-test with the z-test and probably be just fine. The t-test is just slightly more conservative.

Regards,

Noah

Hello Noah and Carmen!

Great work! More awareness for significance is really needed in marketing environments ,-)

If you’d like to apply a z-test in Analytics you can try out our chrome extension “Table Booster” which lets you select two rows and a metric to test…

regards,

Siegfried

Hi,

Thank you for the exstension I think I am missing something.

In google analytics I get N/A for every report

how can I sort it out ?

What am I missing ?

Thank you !

Hi.

Thanks for this extension.

I have a little problem with using it in Russian interface of Google Analytics.

I choose the same dates and correct day of the week matching, but I get the error “time periods begin on different days”

http://img.netpeak.ua/mrbubu/140170663338_kiss_164kb.jpg

With English interface everything works correct.

http://img.netpeak.ua/mrbubu/140170674627_kiss_151kb.jpg

I think the problem can be in the difference of the beginning of the week. In Russian interface it is monday, in English – sunday.

Hi Siegfried,

Thanks for the comment! I did find your Table Booster extension while researching prior to designing this extension. I really like the conditional formatting it provides. Regarding the z-test, however, I was unable to get it to work. And I wasn’t completely sure what it would be comparing. A difference between metrics for the same segment? Between different segments for the same metric?

Hey Alex,

You’re spot-on correct for why the extension is not working in the Russian interface. I can’t tell you how many times the American MM-DD-YYYY date stamp has screwed something up in my code

When I have time, I’ll take a look at the different date stamp formats in the GA interface, and address the issue. I’m guessing the format is the same for all non-American country preferences. I’ll let you know when it’s been corrected.

Noah

Erez,

Could you post a screenshot?

Thanks,

Noah

Hi Noah,

Got it working

Thank you for your kindness and for the tool !

Hey Erez, I’m glad to hear it! I think the extension sometimes requires a page refresh. Maybe that was it? Anyway, glad you’ve found the tool useful. Let me know if you have any suggestions for updates.

Noah

Hey Noah,

I can’t seem to get the Wilcoxon test to work correctly.

I’m comparing two 14-day windows, one from February and one from June. A visual inspection of the graph shows February with a larger amount of ‘sessions’ at every point of comparison but the test is coming back as NOT statistically different. I’ve also tried different metrics with the same results.

Any thoughts on why this would be?

Ryan

P.S. The paired t-test is (seemingly) working fine.