Subset Your Google Analytics Data with R


R is a relatively easy language to use when performing statistical and graphical analyses on data. However, after choosing your dataset, you may still need to subset it. Luckily, R’s built-in functionality can be especially helpful in getting the sections of data you need.

Since Becky gave us a great introduction to querying Google Analytics data in R, I’d like to go a step further and show you how to subset your data once you have it loaded in R. Below you’ll find some quick tips and tricks:

Step 1: Download Some Packages (Optional)

There are a lot of ways to get your data out of Google Analytics; however, one of my favorite ways is to use the rga package.

If you prefer a different method, feel free to skip Steps 1 and 2. If you need a refresher, follow the steps below:

Step 2: Pull Data Out of Google Analytics (Optional)

Once you’ve downloaded these packages, grab the ID of the view that you would like to query. This can be found under Admin > View > View Settings > Basic Settings > View ID. We’re going to store this is a variable called “viewID”. Then we’ll pull out an example data set.

Keep in mind that you can use any sessions or dimensions available in the Google Analytics interface (explore your options here). For the purposes of this example, gaData should look like the following:

Step 3: Subset Your Data

Let’s start by understanding some of the basic syntax of subsetting data in R by reviewing how to select the entire gaData data frame.

When you view workingData, it should now be identical to gaData.

Subsetting Your Data with Inclusion Methods (Keeping Data)

Sometimes it’s more efficient to keep the parts of the data that you want to use for analysis than to exclude them.

In this example, we will show you how to include only certain rows. Here, we will include only rows 1-5 of gaData in workingData. After each of these code blocks, workingData should look similar to this:
workingData Include Rows

# REMEMBER: You can look at your workingData data frame by typing the code below.
# Just keep in mind that every time you assign a new value to the variable, the old value is lost.

Now, we can include only certain columns. This time, we will only include the “browser” and “operatingSystem” columns. Afterwards, workingData should look like this:
workingData Include Columns

Subsetting Your Data with Exclusion Methods (Omitting Data)

Other times, it’s more efficient to exclude portions of your data set than to include them.

Here, we will exclude rows where “browser” = “Chrome” and “operatingSystem” = “Android”. Your workingData should then look like the following:
Exclude Rows

In this next example, we will exclude the “browser” and “operatingSystem” columns. Therefore, workingData should look like this:
Exclude Columns

By no means is this a comprehensive list of methods for including or excluding rows and columns in your Google Analytics dataset. However, this should help you get your feet wet and open your door to exploration.

If you have a favorite method of subsetting your data in R, please feel free to share with others by commenting below.

Kaelin Harmon is a former contributor to the LunaMetrics blog.

  • Maybe in last example comas should be before vectors in [] to exclude columns? It seems like you are excluding rows in two examples.

    • Kaelin Harmon

      Hi Alex,
      Great catch – I’ve updated the post. Thank you!

  • Ivan Kutas

    I use ‘subset’ function for that.

Contact Us.

Follow Us



We'll get back to you
in ONE business day.
Our Locations
THE FOUNDRY [map] LunaMetrics

24 S. 18th Street
Suite 100

Pittsburgh, PA 15203


4115 N. Ravenswood
Suite 101
Chicago, IL 60613


2100 Manchester Rd.
Building C, Suite 1750
Wheaton, IL 60187