Upcoming LunaMetrics Seminars
Seattle, Nov 3-7 New York City, Nov 17-21 Washington DC, Dec 1-5 Los Angeles - Anaheim, Dec 8-12

Making URLs Better Through Content Grouping in Google Analytics

URLs are often one of the most problematic labels for data in web analytics: they’re messy, full of inconsistency, gunked up with a bunch of query parameters that may or may not be useful to you. It tends to make analyzing your content a mess.

Here, sort this stack of needles.

Here, sort this stack of needles.

There are a number of suggestions for cleaning up those URLs (here’s a blog post that’s an oldie but a goodie on cleaning up URLs, written in 2010 but still useful). Or, if you can alter the code on your site, you can send any data you want, which is sometimes used to rewrite or clean up the URL mess. That’s gotten much easier with Google Tag Manager, but there are still lots of situations in which you can’t change the code (lack of expertise in your organization, political fights with the IT department, etc.).

Content Groupings are a new feature in Google Analytics that allow you to classify your pages into groups or categories. My colleague Alex recently wrote a great article about how to classify posts on a blog with groupings like day of the week, number of images, and so on using Content Grouping with Google Tag Manager. Let’s take another look at what other applications Content Grouping can be good for — no code or GTM required.

Finding the Hidden Information in URLs

A lot of times there’s information contained in your URLs that is important, but not necessarily easy to sort by in analytics. If you have a nice, clear directory structure on your site, the Content Drilldown report can be great:

/services/google-analytics/
/services/seo/
/services/pay-per-click/
/blog/
/about-us/

The Content Drilldown report lets you roll those up into folders. Great!

Screen Shot 2014-02-26 at 10.26.54 AM

But, wait! you say. My URLs have information in them, but not in nice little directory structures like that. Maybe they look like this:

/product_widget.php?type=neutrino&flavor=tau
/product_widget.php?type=quark&flavor=strange
/product_transmogrifier.php?model=cardboard

So, notice that these URLs do indeed have useful information in their structure, but now the Content Drilldown report is useless, because it just looks for folders (separated by a slash).

Screen Shot 2014-02-26 at 1.34.21 PM

So what if I want to answer easy questions like…

  • How many pageviews were there to any widget?
  • What about just neutrino widgets?
  • Which model of transmogrifier was more popular?

You can bend over backwards sorting and filtering these URLs in the All Pages report to answer those questions, but you don’t have to. The new Content Grouping features, in addition to letting you group pages by code as mentioned above, also allow you to group pages by patterns extracted from the URLs (using regular expressions).

Content Grouping by Extraction

Screen Shot 2014-02-26 at 11.04.53 AM

You can create up to five different Content Grouping dimensions in each view. Each grouping can have any number of groups within it. You want to map this out and figure out what you want to know before you get started. For our example URLs above, we might want the following groupings:

  • Product Category: widget, transmogrifier, etc.
  • Product Type: neutrino, quark, electron, etc.
  • Product Flavor: tau, strange, charm, top, etc.
  • Product Model: cardboard, gun, etc.

These pieces of information are all present in the URLs, either as part of the path or a query parameter. We can pull out this information with Content Groupings.

To create a Content Grouping, go to the Admin section of Google Analytics. (You’ll need to be an administrator to make these changes.) You’ll notice there’s an option called Content Grouping under the settings for your view (pictured at right). Click on the Create New Content Grouping button to get started.

First, you have to give each grouping a name. Let’s start with the first example above and call this one Product Category.

Screen Shot 2014-02-26 at 11.31.18 AM

Next, you’ll notice that there are three ways to assign groups: by tracking code (detailed in the previously referenced article), by extraction (what we’ll talk about here), or by rules (basically manually building a set of criteria, like a filter, and giving each set of rules a name).

Extraction Regular Expressions

We’re going to use regular expressions to extract the piece of the URL we want to use as the names of the groups. (If you’re not already familiar with regular expressions, now is a great time to familiarize yourself with them.) You extract part of the URL by using a regular expression enclosed in parentheses, like this:

/product_(.*?)\.php

Our URLs, remember, are /product_widget.php or /product_transmogrifier.php. The regular expression .*? matches any character (the dot) any number of times (the star), and the question mark makes it “lazy”, meaning that it will end at the first possible location. (The “lazy” bit isn’t vital in this example, but it often is, and it’s a good idea in general in defining Content Groupings. Like anything else in GA, test it out and make sure you get it right in a test view first.) It’s OK that other stuff comes at the end of these URL strings; we only have to be as specific as we need to get the part we want to extract.

Screen Shot 2014-02-26 at 11.30.47 AM

So, what this regular expression matches is the part of the URL between the underscore and the .php: the word widget or transmogrifier or whatever else happens to be there. For this one, that’s it, and we can save it.

Likewise, to round out all the examples (create an additional grouping for each — remember you get up to five groupings total in each view):

  • Product Category: /product_(.*?)\.php
  • Product Type: type=([^&]+)
  • Product Flavor: flavor=([^&]+)
  • Product Model: model=([^&]+)

You can really extract any piece of a URL that has a salient piece of information for grouping your pages.

Using the Content Groupings

Now, the unfortunate part of this story, like pretty much all of the things you change under the hood in Google Analytics: it’s not retroactive. These groups are only applied to new data that comes in for your site, not to historic data.

Screen Shot 2014-02-26 at 12.03.21 PM

Once you have that data, though, it’s easy to apply these groups to your URLs. You’ll find the Content Grouping drop-down in the All Pages report (or add a grouping to custom reports as a dimension).

Now I can easily see metrics report by (in this case) Flavor. You can always drill down within a group to see in the individual URLs, but the groups make it easy to roll them up in the way you’ve defined.

Screen Shot 2014-02-26 at 1.42.32 PM

So, Content Groupings are one more element in your arsenal in the war on bad URLs in your data. I hope they help!

Jonathan Weber

About Jonathan Weber

Jonathan Weber is the Data Evangelist at LunaMetrics. He spreads the principles of analytics through our training seminars all over the East coast. The next seminar he'll be leading will be a Google Analytics training in Boston. Before he caught the analytics bug, he worked in information architecture. He holds a Master’s degree from the University of Pittsburgh School of Information Sciences. Jonathan’s breadth of knowledge – from statistics to analysis to library science – is somewhat overwhelming.

http://www.lunametrics.com/blog/2014/02/27/urls-content-grouping-google-analytics/

6 Responses to “Making URLs Better Through Content Grouping in Google Analytics”

Sam says:

I also hate these long and confusing ulrs with all those tags and numbers etc. for instance as i read this article, the url showing in my browser is this: http://www.lunametrics.com/blog/2014/02/27/urls-content-grouping-google-analytics/#sr=g&m=o&cp=or&ct=-tmc&st=%28opu%20qspwjefe%29&ts=1394636833That's not a very atractive ulr is it, if I wanted to get backlinks to this article with in my blog won’t I have a difficult job doing so?

Danielle says:

How were you able to create different content groups, that were fed off the same site page, without creating duplicate pageviews?

I would like to create content groups for product category and a key product attribute, but all I’ve been able to do is create two pageviews for each one page hit. I’m using GTM to pass the category and attribute information to the datalayer, which is picked up by the pageview tag that has both content groups in it, firing both off the same datalayer event rule.

The information I need to group on isn’t, and cannot, be contained in the page title or URL.

Dan Wilkerson says:

I’ve been using VPVs and filters to accomplish this, but I did consider going the content grouping route. I was more interested in saving those for semantically similar pages, though :D

Thanks, Jonathan!

Danielle — this article is focused on non-code approaches to content grouping (where the info is in the title or URL). For a case like yours, where you are using code (or GTM), see this earlier article: http://www.lunametrics.com/blog/2014/01/24/classify-blog-posts-analytics-content-groupings/

In summary: in GTM, you don’t want to create an additional page tracking tag (which will duplicate pageviews, as you point out). Instead, use the Content Groups under “More Settings” to pass along the content groups from a macro or data layer variable along with the original pageview.

Dan — you’re right; you only get 5 content groupings, so you have to use them judiciously!

This approach has the advantage of not having the situation where the URLs you record in GA mismatch the URLs that actually appear when you use the website (which could happen with virtual pageviews, depending on how you are rewriting the URLs, and can be hard to match up with the real pages for users who aren’t familiar with the virtual URL scheme).

Danielle says:

Johnathan, thank you for your response! I had structured it that way, but your confirmation that it was correct helped me figure out the error was actually because I structured my dataLayer variables incorrectly, causing it to fire twice! Being new to something is dangerous lol

Leave a Reply