Upcoming LunaMetrics Seminars
Boston, Oct 6-10 Chicago, Oct 20-24 Seattle, Nov 3-7 New York City, Nov 17-21

Intro to GA Regular Expressions: Part XIV of XIV





This is the last of fourteen posts I have done on Regular Expressions for Google Analytics. Now that I have learned them (and hopefully explained them), it’s time to have that introductory post (I always do like to work backwards.)

Here’s the reason. People skip the introductions to books, they don’t read manuals, they just want to figure out how to make “it” work, whatever it is.

regular-expressions

Only once it works are we ready to say, OK, so what? Why do I care, what’s it good for, what’s it bad for?

What are Regular Expressions (RegEx)? They use characters on your keyboard (like * and ^), enabling you to create an expression that may or may not match a target expression. They have a strict set of rules — just like a programming language would — and it’s easy to make mistakes with them. (This is why I am a big user of this RegEx checking tool. ) So you will always have at least two expressions, the Regular Expression with the funny characters and the expression you are matching on your site, or in someone’s address, or keyword. Here’s a quick example of the Regular Expression vs. target expression issue: I can create a regular expression like this luna|robb?in and then match it against the keywords people used to come to my site to filter out all the times people used my company name or my own name, whether they spelled it right now not. In this example, the keywords were my target expressions. (Need to understand that pipe symbol in the RegEx? Need to understand the question mark in the RegEx?)

So why use them? The first reason that RegEx are worth caring about — if you use Google Analytics — is that Google cares. There are certain tasks Google just won’t let you do correctly without using RegEx. Examples that come quickly to mind are: take yourself out of the data using an IP address, creating a custom filter, creating a filter that enables you to see both your subdomain and your domain in the same profile. (The latter is just an example of the second example, a custom filter, but I mention it because you can read about it in the GA help section.)

Other great examples of custom filters: Create a filter to learn what words people actually type in to Google before they click on your AdWord, instead of just learning which AdWord gets credit. (I use this one all the time. The only hack I like better is this one.) Force all your reports to give you pages by title instead of URL.

OK, so we understand that they are needed for filters. But how about goals? After all, you can do a head match or an exact match, why go to the trouble of using a RegEx match?

One reason would be if you have two pages that are essentially the same goal or the same place in the funnel. So, for example, let’s say that when the visitor reaches either of these two pages, www.mysite.com/folder3/thanks.html and www.mysite.com/thanksalot.html, he has really achieved the same goal. By using RegEx, you can make your goal page in Google Analytics /thanks and whenever someone reaches either of those pages, the same goal (G1, or G2, or whichever one you choose) is incremented. Then if you happen to care about which page actually matters the most, you can easily go to the Content Optimization > Goals and Funnel Verification > Goal verification to see which page mattered the most.

Of course, if you have other pages on your site that match /thanks, you have to get more specific with your RegEx. However, I never forget the lesson that a friend taught me: keep your RegEx as simple as possible.

Another reason to use RegEx is when you are lazy. After all, just because there are a ton of variations, who wants to create a ton of iterations? The example above shows how to combine two into one, but what if you had 15 variations? Example: you have to be sure that your company is not in the analytics. Your company owns all the IP addresses from 72.77.12.26 through 72.77.12.40, inclusive. You sure don’t want to create 15 filters. Instead, you can use a regular expression like this: ^72\.77\.12\.(2[6-9]|3[0-9]|40) — it will capture all fifteen IP addresses. (The little carat ^ at the beginning says, don’t match if there is something before the 72. The backslashes \ turn the special dots into plain dots. This 2[6-9] means, match 26, 27, 28 and 29. This 3[0-9] says, match 30, 31, etc through 39. 40 says, match 40. The pipes are OR signs. So at the end of the expression, you’ll be matching to 26-29, OR 30-39 OR 40.)

Well, that’s it on RegEx for now. Here are all the prior posts in the series:

Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
Intro to RegEx
{Braces}
Minimal Matching

Now that I am done with this series, I’ll go back and make sure that all the posts links to all the other posts consistently. Done! Done! All done!

Robbin

Robbin Steif

About Robbin Steif

Our owner and CEO, Robbin Steif, started LunaMetrics ten years ago. She is a graduate of Harvard College and the Harvard Business School, and has served on the Board of Directors for the Digital Analytics Association. Robbin is a recent winner of a BusinessWomen First award, as well as a Diamond Award for business leadership.

http://www.lunametrics.com/blog/2007/01/28/intro-to-ga-regular-expressions-part-xiv-of-xiv/

14 Responses to “Intro to GA Regular Expressions: Part XIV of XIV”

Justin Cutroni says:

Robbin,

I’d like to congratulate you on this series. It’s one of the best, if not THE BEST, tutorials o regular expressions I have seen. I just can’t explain how fantastic it is.

Great Job!

I must have learned 60% of this from you. We definitely have to start a mutual admiration club…

Robbin

Anonymous says:

Please do make sure you go back and add easy links from each of the parts of this tutorial to the others. It was soooo good that I worked hard to find them, but I would love to be able to point someone to the first page and know that they’ll be able to get to the whole tutorial. This is a fabulous resource.

Joe says:

Fantastic tutorial. Human-readable. Just what I needed.

[...] you can do with Reg ex. If you really want to learn about reg ex check out my friend Robbin’s series on the subject. Share: These icons link to social bookmarking sites where readers can share and discover new [...]

Jon says:

Hi,

So, would the RegEx be “/thanks” (minus quotes) for your example about tracking multiple pages with virtually the same actual goal?

Thanks,

Jon

Robbin says:

Yes, absolutely. And then you can use the Goals > Goal Verification report to see which of those two pages were the ones that were reached the most.

[...] Intro to GA Regular Expressions [...]

[...] How regular expressions work in Google Analytics by Robin Steif [...]

I am so happy i found your site. You are such an motivation!

Mark C says:

I am floored, and delighted, to finally find a series of articles explaining regex in detail to the non-programmer!

Also, it rocks that you wrote the introduction last :- )

Mark C says:

One query by the way… I asked a Java programmer friend for some help with a regex. Trying to be helpful, I said “I believe GA uses the Posix ERE kind of regular expressions”.

He disagreed, saying that it doesn’t support negative forward lookup.

What is your view?

Robbin Steif Robbin Steif says:

Hi Mark. GA claims that they use Posix (or maybe their old help center did.) They *used* to support negative lookahead, but disabled it. The analyst who wrote a comment at the very top of this thread, Justin Custroni, always posited that the GA Regex are PCRE. So, go figure.