Intro to GA Regular Expressions: Part XIV of XIV/
January 28, 2007
This is the last of fourteen posts I have done on Regular Expressions for Google Analytics. Now that I have learned them (and hopefully explained them), it’s time to have that introductory post (I always do like to work backwards.)
Here’s the reason. People skip the introductions to books, they don’t read manuals, they just want to figure out how to make “it” work, whatever it is.
Only once it works are we ready to say, OK, so what? Why do I care, what’s it good for, what’s it bad for?
What are Regular Expressions (RegEx)? They use characters on your keyboard (like * and ^), enabling you to create an expression that may or may not match a target expression. They have a strict set of rules — just like a programming language would — and it’s easy to make mistakes with them. So you will always have at least two expressions, the Regular Expression with the funny characters and the expression you are matching on your site, or in someone’s address, or keyword. Here’s a quick example of the Regular Expression vs. target expression issue: I can create a regular expression like this luna|robb?in and then match it against the keywords people used to come to my site to filter out all the times people used my company name or my own name, whether they spelled it right now not. In this example, the keywords were my target expressions. (Need to understand that pipe symbol in the RegEx? Need to understand the question mark in the RegEx?)
So why use them? The first reason that RegEx are worth caring about — if you use Google Analytics — is that Google cares. There are certain tasks Google just won’t let you do correctly without using RegEx. Examples that come quickly to mind are: take yourself out of the data using an IP address, creating a custom filter, creating a filter that enables you to see both your subdomain and your domain in the same profile. (The latter is just an example of the second example, a custom filter, but I mention it because you can read about it in the GA help section.)
Other great examples of custom filters: Create a filter to learn what words people actually type in to Google before they click on your AdWord, instead of just learning which AdWord gets credit. (I use this one all the time. The only hack I like better is this one.) Force all your reports to give you pages by title instead of URL.
OK, so we understand that they are needed for filters. But how about goals? After all, you can do a head match or an exact match, why go to the trouble of using a RegEx match?
One reason would be if you have two pages that are essentially the same goal or the same place in the funnel. So, for example, let’s say that when the visitor reaches either of these two pages, www.mysite.com/folder3/thanks.html and www.mysite.com/thanksalot.html, he has really achieved the same goal. By using RegEx, you can make your goal page in Google Analytics /thanks and whenever someone reaches either of those pages, the same goal (G1, or G2, or whichever one you choose) is incremented. Then if you happen to care about which page actually matters the most, you can easily go to the Content Optimization > Goals and Funnel Verification > Goal verification to see which page mattered the most.
Of course, if you have other pages on your site that match /thanks, you have to get more specific with your RegEx. However, I never forget the lesson that a friend taught me: keep your RegEx as simple as possible.
Another reason to use RegEx is when you are lazy. After all, just because there are a ton of variations, who wants to create a ton of iterations? The example above shows how to combine two into one, but what if you had 15 variations? Example: you have to be sure that your company is not in the analytics. Your company owns all the IP addresses from 220.127.116.11 through 18.104.22.168, inclusive. You sure don’t want to create 15 filters. Instead, you can use a regular expression like this: ^72.77.12.(2[6-9]|3[0-9]|40) — it will capture all fifteen IP addresses. (The little carat ^ at the beginning says, don’t match if there is something before the 72. The backslashes turn the special dots into plain dots. This 2[6-9] means, match 26, 27, 28 and 29. This 3[0-9] says, match 30, 31, etc through 39. 40 says, match 40. The pipes are OR signs. So at the end of the expression, you’ll be matching to 26-29, OR 30-39 OR 40.)
Well, that’s it on RegEx for now. Here are all the prior posts in the series:
Dollars signs $
Question marks ?
Square brackets and dashes –
Plus signs +
Regular Expressions for Google Analytics: Now let’s Practice
RegEx and Good Greed
Intro to RegEx
Now that I am done with this series, I’ll go back and make sure that all the posts links to all the other posts consistently. Done! Done! All done!