Google Analytics Data Mining with BigQuery and R


Big Query and Big Query Export for Google Analytics give us the power to visualize and explore virtually any trend in our GA data. It’s really quite powerful stuff. Because this tool is still very new, I want to get the conversation started on how advanced reporting can augment our digital analytics.

In this post I discuss data mining and the advanced reporting of Google Analytics data. I provide an R script for generating an E-commerce report with visualizations that are not possible within Google Analytics.

R script for Big Query E-commerce

My colleague Jonathan Weber has a nice article on Big Query and Google Analytics Data Export. Check this out for an overall refresher on Big Query, including billing information.

What are the Use Cases for Big Query in GA?

There are two big reasons to use Big Query to process and report on Google Analytics data:

  1. Sampling. Even for Premium customers, usage of advanced segments and non-standard secondary dimensions can lead to sampling. This can greatly affect the accuracy of quarterly, and even monthly, reporting.
  2. Data mining and advanced reporting. Google Analytics is, in our opinion at LunaMetrics, the best overall technology for digital analytics. But, like anything, it has its shortcomings. One of these is the customizability of reporting. GA’s ease of use and shallow learning curve also results in the need for additional tools for data mining and more-complex analysis.Cue Big Query. Our ability to generate complex metrics and mine trends is limited only by our knowledge of SQL. And the flexibility in visualizations by the limits of R, ShufflePoint, Tableau…a wide range of solutions.

Note: As a general cloud service, Big Query is not exclusive to GA Premium customers. However, Google Analytics Premium IS required in order to export complete GA data to Big Query. So, if you are not Premium, you can still upload data to Big Query and use it as a cloud solution for SQL-like data analysis. But you need Premium for working with GA data.

Questions about premium? See here for more information.

Overview of the Report

The R script should function out-of-the-box for any GA Premium customers with E-commerce data (and with the Big Query export enabled).

If you don’t have Google Analytics Premium, but are still interested in data mining and advanced reporting in for Google Analytics, you can still interface with R! Check out the R for Google Analytics library. By combining their starter template with the code for the graphs from my script, you should be able to achieve similar functionality. You will also need to conduct additional processing of this data in R, since you are unable to take advantage of the SQL-like querying of Big Query. Additionally, those with a high traffic volume in GA will likely experience sampling.

One last note. I owe a shout-out to Hadley Wickham for writing the Big Query R library.

Here is the full report generated from the R script (scroll down for a walk-through):

Big Query E-commerce report using R

Figure 1: Average pageviews with transaction versus without transaction

You could also get this information by using advanced segments in Google Analytics, but it’s nice to get it unsampled and to be able to generate box-and-whisker plots.

r-bigquery page 1

Figure 2: Average Revenue/Session by Medium

This is something you could calculate from Google Analytics, but that is not readily available as a metric. In this chart we graph two metrics simultaneously; average revenue is given by height, and the weight of each channel (sessions) is given by color.

r-bigquery page 2

Figure 3: Top Campaigns by Revenue/Session

This chart is similar to the prior, except that we compare campaigns rather than medium. Again, the height shows the average revenue/session, and the color displays the weight of each campaign.

r-bigquery page 3

Figure 4: Product Category of Purchase by Medium (a)

Here, we group products purchased by their category. Then we break down the sales data for each category by medium.

r-bigquery page 4

Figure 5: Product Category of Purchase by Medium (b)

This displays the same information as the prior chart, but visualized differently. The darkness of the tile indicates the number of products sold for a given product category and medium.

r-bigquery page 5

Figure 6: Likelihood of product to indicate and/or lead to additional transactions

This is my favorite chart. It is something we would never be able to calculate in Google Analytics. Here, we examine the products associated with a user’s initial transaction. We then calculate the percentage of users who made at least one additional transaction. And we segment this data by the initial product purchased. Thus, (pending a statistical test), we see that certain products may be indicators and/or causes of additional future purchases. Maybe they’re really great products and win the customer’s loyalty.

img id=”img6″ class=”alignnone size-full wp-image-12651″ src=”” alt=”r-bigquery page 6″ width=”588″ height=”767″ />

R Script for Big Query E-commerce Report

Noah is a former LunaMetrician and contributor to our blog.

  • Mariana Alves

    Awosome post!!! Tks for share it!

  • Niko

    Hello. Unfortunately, your code need to be changed because R dont understand “<-" replaced by " <- "

  • Stefano 6884

    Hi Noah, thanks for the interesting post! Just be careful that, in your code, “& lt ; -” appears instead of “<-" and
    the html code for image 6 has a missing "<" at the beginning

Contact Us.

Follow Us



We'll get back to you
in ONE business day.
Our Locations
THE FOUNDRY [map] LunaMetrics

24 S. 18th Street
Suite 100

Pittsburgh, PA 15203


4115 N. Ravenswood
Suite 101
Chicago, IL 60613


2100 Manchester Rd.
Building C, Suite 1750
Wheaton, IL 60187