Upcoming LunaMetrics Seminars
Los Angeles - Anaheim, Sep 8-12 Washington DC, Sep 22-26 Boston, Oct 6-10 Chicago, Oct 20-24

Regular Expressions Part XIII: Good Greed





regular-expressionsThis is my next to last post in this Regular Expression (RegEx) series. I have been thinking about this post for a long time and yesterday someone asked me a question (which finally got me to write this). She wrote that she had two pages that she wanted to roll into one Google Analytics goal. She created the Regular Expression for it, ran it through Epikone’s RegEx Coach, and it worked — but it wasn’t working in GA. (More on the Coach below.)

The two pages were:

subdomain.mysite.com/folder/subfolder/GoalThree.php
subdomain.mysite.com/folder/subfolder/GoalThreesome.php

She sent me a long, complicated expression which wasn’t working for her and asked my opinion.

This is absolutely a case of putting Good Greed to work for you, we will see in a minute. As I wrote in my last post, Regular Expressions are very greedy and they match everything unless you tell them not to. This is a very hard concept to wrap your head around — it means that, among other things, all the stuff before the expression and all the stuff after it gets matched to random things (unless you tell it not to. Or there is nothing to match to.)

Anyway, I wrote her back and said, why don’t you just write an expression like this:

/folder/subfolder/GoalThree

This assumes that she doesn’t have other GoalThreeVersions that will be incorrectly mixed in here. If, for example, she had another page, /folder/subfolder/GoalThreeCornered, that would qualify as a match too (because the RegEx matches everything it can, even if those characters aren’t in the Regular Expression.) Moving back to how simple her RegEx might be, she might even have been able to get away with a goal like this, depending on her site:

/GoalThree

This matches every expression that includes /GoalThree

Finally a word about the Epikone RegEx coach. I haven’t talked to Justin about this. But I am fairly sure that the coach is configured to check whether the phrase you type is a match to the RegEx you type, using the way GA interprets RegEx. That doesn’t mean that you necessarily come up with a valid goal, or an IP address that will actually filter anything. For example, you might use it to see if colou?r is a valid RegEx for color and for colour (it should be), but that doesn’t mean colou?r will necessarily work in your Google Analytics profile filters or goals. You really have to understand the context in which you are using the expression and what GA demands of you in addition to correctly configuring two expressions to match each other.

Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
Intro to RegEx
{Braces}
Minimal Matching

Robbin

Robbin Steif

About Robbin Steif

Our owner and CEO, Robbin Steif, is an analyst herself. She is a graduate of Harvard College and the Harvard Business School, and has served on the Board of Directors for the Digital Analytics Association (formerly the Web Analytics Association.)

http://www.lunametrics.com/blog/2007/01/11/regular-expressions-part-xiii-good-greed/

3 Responses to “Regular Expressions Part XIII: Good Greed”

Ralph HIbbs says:

Robbin,
This is an awesome series of postings–They have greatly helped my understanding.

On trick I learned for debugging RegEx for Google Analytics Goals configurations is that the search box on the (Content/Top Content) report takes RegEx. When I’m trying to configure a Goal that captures several confirmation page URLs, I test it in the search box to see if I’m pulling only the desired pages.

Ralph

Robbin Steif says:

Yes! This is a great idea, I use it a lot. Thanks for the addition – Robbin

Nic says:

What about

subdomain/.mysite/.com/folder/subfolder/(GoalThree|GoalThreesome)/.php