4 Ways to Export Your Google Analytics Data with R

4 Ways to Export Your Google Analytics Data with R

/

blog-export-data-r-tinypng
Are you tired of spending half your day copying data out of the Google Analytics interface to update that same old report? Luckily, there are a lot of tools out there to help with this: Google Spreadsheets, Shufflepoint, and Tableau, just to name a few. One of my favorite free tools is R.

Rlogo

Wait, why are we talking about the alphabet?

R is a very powerful program for visualizing and analyzing data. It can also easily access the Google Analytics API. With just a few lines of code, you will have all of your Google Analytics data at your fingertips, ready for your stunning graphics, cutting edge analysis, or just to be dropped in a csv file.
So how do I do this?

Enable the Google Analytics API

First, you need to make sure the Google Analytics API settings are configured correctly. To do this:

  1. Go to the Google Developers Console.
  2. Create a new project.
    Developer Console New Project
  3. Give your project a name, agree to the Terms of Service (after reading them of course), and create your project.
    Developer Console New Project 2
  4. Now select “APIs” in the left rail.
  5. Search for “Analytics API” and click on the result.
  6. If you see a blue button with the words, “Enable API,” click on it. If the button says “Disable API,” the API is already enabled.

Developer Console Analytics API

Download R and RStudio

Next, you need to download R and RStudio. (RStudio is essentially the user interface for R, so that is what I will refer to from here on out). When installing, choose all of the default options unless you have a reason not to. Once those are finished downloading, open RStudio.

Download a Package for Accessing the API

In addition to R, you need to download an extension for accessing the Google Analytics API. In R lingo, these are called packages – they are basically the R equivalent of an Excel add-in. There are several options, but the package I am going to use is called RGA. To download the package, run the commands:

Now, we need to tell RStudio that we want to use this package by using the command:

Note that you only have to download RGA once, but every time you open RStudio, you will need to run the library(rga) command to tell RStudio that we want to use that package.

Authenticate

Now you need to authenticate, which is basically telling Google Analytics that you have a right to access this data. Run the command:

A web browser should pop up asking you to log into Google Analytics (if you aren’t already) and confirm that the application can access your data. Click “Allow”.

RGA Authenticate

Now you will see a long list of numbers and letters:

RGA authenticate 2

Copy this code and paste it into the console in RStudio:

RGA Authenticate 3

Choose Your View

The Google Analytics API allows you to export data from one View at a time. You need to decide which View you wish to retrieve data from and communicate that to RStudio by passing in the View ID. You can find this information in the Admin section of Google Analytics under View Settings.

View ID

Store this information in a variable called id:

Get that Data!

Finally, it’s time to retrieve the data. Try running:

You will see the number of users, sessions, and pageviews to your site during the past week. You can customize this data export by adding additional information to the ga$getData command using the following parameters:

batch: This is automatically set to TRUE. The batch parameter allows you to get around Google’s limit of pulling 10,000 observations per pull.
walk: Setting this to TRUE will allow you to avoid sampling by pulling you Google Analytics data by increments of 1 day. Only use this functionality if it is need though, as using the walk attribute will query the API many times, increasing the time it takes to pull the data as well as using up your API quota. Additionally, if you data is still sampled on the one day level, this functionality will not completely eliminate sampling.
start.date: Start date for your data export. This should be formatted as as.Date(“YYYY-MM-DD”), for example, as.Date(“2015-07-01”).
end.date: End date for your data export. This should be formatted as as.Date(“YYYY-MM-DD”), for example, as.Date(“2015-07-01”).
metrics: A list of the metrics that you wish to pull. You can have up to 10 metrics.
dimensions: A list of the dimensions that you wish to pull. You can have up to 7 dimensions.
sort: (optional) Sort the data by a specific dimension or metric
filters: (optional) Add filters to your data.
segment: (optional) Pull data from a specific segment. This can be either a segment already created in Google Analytics, or you can create it here.
start: (optional) Pull data starting from this row of data
max: (optional) maximum number of results to return

For a more complicated example, pull the total pageviews and entrances for each page, by date, from Sept 1 – Sept 30. Also, filter this data by only organic traffic in the United States and Canada. Finally, sort this data by number of pageviews, descending, and store the result in a variable called gaData.

Our Four Options

Now that we have the data, what should we do with it? Here are four quick options for getting the data out of R and into something we can use.

Export to CSV

This is your easiest option. You only need one line of code:

This will write the csv file to your working directory. If you do not know what this is, run the command:

You can also explicitly select a file location like this:

Export to Excel

In general, I prefer writing data to a csv rather than to an .xlsx file. However, given the prominence of Excel in the business world, knowing how to read and write directly to .xlsx files is a huge time saver. Fortunately, there a number of great R packages to help us do this. I generally use xlsx.

Before you download the xlsx package, make sure that you have the latest version of Java Development Kit (jdk). Then run this code to install and load xlsx:

Now you can write your data to a workbook called My Data Export.xlsx and a sheet called Data.

Export via a graphic

R is well known and loved in the statistical community for its ability to create powerful visualizations. There are a number of packages out there to help with this but ggplot is possibly the most well-known and loved. Here is some code to create a time series plot for organic sessions from the United States and Canada.

This will create a graph in RStudio that you can export as an image or a PDF.

ggplot

Export to a database via an ODBC connection

R has a lot of support for connecting to databases. In particular, RODBC is a powerful, easy to use package for connecting to a database via an ODBC connection. This can help you integrate with Access databases on your local machine, company-wide data warehouses, and everything in between.

The following code will write your GA data into an Access Database. (You will first need to check that you have an ODBC connection set up to your database.

There are also additional options for appending the data into an existing table or for changing the variable types.

And so much more!

One of R’s great strengths is that it has a huge following of brilliant developers and statisticians. As a result, there are packages for doing almost everything, even the most cutting edge machine learning techniques. If you have a specific reporting need, R probably has a solution. Here are a few more examples to get you started:
Dropbox
Google Docs
LaTex
HTML
Email

Becky is an Analytics Engineer at LunaMetrics with a passion for data science. She started deriving equations and building calculators in high school and ended up with a Masters in math from Georgia Tech. Her experience in data analysis and reporting has given her a great appreciation for data driven decision making. Becky enjoys swimming, working on puzzles, and spending time with her husband Jonathan.

  • RussellShepherd

    RGA + ShinyDashboards is a great way to leverage GA data. We use this solution to help teams automate monitoring their traffic data, add some original analysis (e.g. forecasting, clustering), and mix in other data sources.

    • Shaun

      I can’t even get passed ga$getData(id). Haha

    • Becky West

      Thanks for commenting, Russell. I am also a fan of Shiny Dashboards and am excited to hear that other people are using it for web analytics. There is so much potential with RGA and Shiny to combine analytics data, machine learning algorithms, and visualizations in a user friendly way.

  • Sarra Ben Brahim

    This was so helpful for me!!
    Thank you Becky West

  • Shree

    Install_github is not working in R..please suggest

    • http://lunametrics.com/ James Keener

      Did the install.packages("devtools") and library(devtools) commands run without error? The first time I ran through I forgot to run the library(devtools) line and got the same error.

    • Becky West

      Hi Shree! Can you tell me the error that you are getting? Did the library(devtools) command run? Also, what version of R are you using?

  • Anthony

    I am stuck here install.packages(“devtools”). Lots of error messages follow. Any ideas?

    Error messages are:

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    unable to access index for repository https://cran.cnr.Berkeley.edu/src/contrib:

    cannot open URL ‘https://cran.cnr.Berkeley.edu/src/contrib/PACKAGES’

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/src/contrib:

    cannot open URL ‘http://www.stats.ox.ac.uk/pub/RWin/src/contrib/PACKAGES’

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    unable to access index for repository https://cran.cnr.Berkeley.edu/src/contrib:

    cannot open URL ‘https://cran.cnr.Berkeley.edu/src/contrib/PACKAGES’

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/src/contrib:

    cannot open URL ‘http://www.stats.ox.ac.uk/pub/RWin/src/contrib/PACKAGES’

    Warning in install.packages :

    package ‘devtools’ is not available (for R version 3.2.5)

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    unable to access index for repository https://cran.cnr.Berkeley.edu/bin/windows/contrib/3.2:

    cannot open URL ‘https://cran.cnr.Berkeley.edu/bin/windows/contrib/3.2/PACKAGES’

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    InternetOpenUrl failed: ‘A connection with the server could not be established’

    Warning in install.packages :

    unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.2:

    cannot open URL ‘http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.2/PACKAGES’

    • Anthony

      Wanted to post back and let you and others know that it was a corrupt Windows Firewall that was causing this problem. This and a slew of other problems cropped up right about the same time and all were related to Windows Firewall.

      • Becky West

        Thanks for adding in your solution. I’m glad you got it straightened out!

  • Julian Aylward

    Thanks for your help. This is amazing! I am trying to replicate my query for another of our sites, but I’m getting the error, “error in fetching data: User does not have sufficient permissions for this profile”. I think this must be because I need to authenticate access to this table as well, but I have no idea how to get the dialog box up as above, since I have already completed this step for the first site?? Any help would be much appreciated 🙂

    • Becky West

      Hi Julian,

      I’m so glad you found this post helpful. I love using R and have found it so helpful for doing more advanced analysis.

      Yes, if you access this other site through a different login, you will need to authenticate for this login as well. There are two ways to do this. You can delete your earlier authentication by using the following code:
      rm(ga)
      Then authenticate again with you other login:
      rga.open(instance=”ga”)

      Your other option is to have two instances open at the same time. This way you can easily query both sites without having to re-authenticate each time. You can do this by changing the name of the instance for the other site. For, example, try running this code:
      rga.open(instance=”ga1″)
      Then go through the authentication steps for the other site.
      For this site, you will need to use ga1 everytime you would have used ga. For example, your code for grabbing data from the second site should look like:
      gaData <- ga1$getData(id, start.date….

      • Julian Aylward

        Hi Becky, thanks so much for the reply. Opening the new instance worked a treat!
        Julian 🙂

        • Becky West

          Awesome! I’m so glad that worked for you!

  • Savas Özdemir

    Is it possible to export a calculated metric?

    • Becky West

      Yes, you can add the custom metric to your list of metrics. You should reference it using the External Name you designated when creating the calculated metric. For example, if you call your metric Test, you can add metrics = “ga:calcMetric_Test” to your API call.

      • Mayuri Awasare

        Hi Becky. I am getting the following error after i execute ga$getData(id):

        Error in if (.self$tokenExpiresIn() <= 0) { : argument is of length zero. Can you help me with this?

Contact Us.

LunaMetrics

24 S. 18th Street, Suite 100,
Pittsburgh, PA 15203

Follow Us

1.877.220.LUNA

1.412.381.5500

getinfo@lunametrics.com

Questions?
We'll get back to you
in ONE business day.