Archive for the ‘Regular Expressions’ Category
Posted on October 28, 2006 by Robbin Steif
I’ve been writing about Regular Expressions for Google Analytics for some time now. The more I learned, the more I wanted to rewrite my very earliest posts, because In The Beginning, I took easy topics and made them hard. Or, I combined too many expressions together without just starting with basic ideas.
Anyway, I rewrote Part I, the backslash and I also rewrote Part II. Originally, Part II addressed multiple wildcards but I simplified it to be just the dot. I will deal with the plus sign + and the asterisk (which RegEx types like to call a star) in future posts.
Over time, I will get them all cross-indexed. (Done, done done!! at last. ) When I change the post title, I break the link, so I’ll fix that too. (sorry!)
If you got lost learning about wildcards on Post II, this would be a good time, to the extent that you actually have time, to go back and just learn about dots.
Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead
Robbin
LunaMetrics
View Comments (No Responses) | Categories: Google Analytics, Regular Expressions
Posted on October 22, 2006 by Robbin Steif
Come learn Regular Expressions for Google Analytics with me. I am learning Regular Expressions for Google Analytics and teaching with each lesson. This is why I roll them out slowly – each expression requires a lot of research. I have been awed at this process because the explanations are so opaque before I understand them, and once I learn them, they make perfect sense. Tonight, let’s talk about square brackets, and I hope you’ll see what I mean.

Google Analytics defines square brackets like this:
[] Match one item in this list
This is exactly what they mean, it just sounds hard because they don’t tell you how to create the list and how to define an item. Simple explanation: When you use square brackets, each character within the bracket is an item. Look at this sample list with five items in it, each of which happens to be a vowel: [aeiou]. The hard part is undertanding that you don’t need anything to separate the characters, and that each item in the list is only one character.
Here’s how someone might use square brackets with Google Analytics. Let’s say you were selling items with part numbers formatted like this: PART1, PART2, etc. You want to know how often someone lands on your site by typing the actual part number into a search engine, but you only care about PART3, PART5 and PART7. So, you could enter PART[357] into the fiter box on the top of your Overall Keyword Conversion report (for example). That will match each of those part numbers. (Technically, it matchest one of these three and more, but I will hold that problem/opportunity for a different post.)
It’s helpful to understand dashes so that you can use square brackets easily. Google Analytics defines dashes like this:
- Create a range in a list
That means, instead of creating a list like this [abcdefghijkl], you can create it like this: [a-l], and it means the same thing — only one letter out of the list gets matched. You can also combine the range method and the brute force, type-them-all-in method and create a list like this: [a-lqtz], which matches any one letter between a and l, or q, or t, or z.
Special case: Sometimes — perhaps often — we really want the dash to be one of the characters we are searching for. Maybe we want to see searches of luna-metrics and lunarmetrics and lunammetrics. In that case, we put the dash at the beginning or end of the list, like this [-rm]. That means that the full RegEx which would match the three lunametrics keywords above would be luna[-rm]metrics. This is because the phrase will start with luna, end with metrics, and in between will have a dash, an r, or an m. Those are the only choices in the little list I created, the one that looked like this: [-rm].
There are other interesting things that you can do with square brackets, but I am leaving them out for now, either because they don’t all work with Google Analytics, or because I think this is enough for today. (Correct me if I’m wrong!)
Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead
Robbin
View Comments (6 Responses) | Categories: Google Analytics, Regular Expressions
Posted on October 12, 2006 by Robbin Steif
As promised, here is installment VII of my Regular Expressions (RegEx) tutorial – parenthesis. I am learning and sharing at the same time. I am only learning them to use for Google Analytics.
I wanted to get this one out soon after my last RegEx post, because the last one was on the use of pipes, which stand for OR in Regular Expressions. Pipes (OR symbols) and parenthesis often go together.

My tutor, Steve in Australia, does a really good job of explaining parenthesis. In the same way that this mathematical statement,-
6*(2+3)
is equivalent to 6*2 plus 6*3, parenthesis in Regular Expressions make sure that the stuff outside of the parenthesis get applied to the stuff inside of the parenthesis equally.
For example — and remembering that the pipe symbol | stands for OR — we can have a regular expression like this:
grand(mother|father)
That will match either grandmother or grandfather.
Or, here is another, similar but not identical example:
Ste(ph|v)en
that will match either Stephen or Steven
What if the two terms are really different and there isn’t much in the way of grouping to do? For example, what if we want to filter out Robbin or Luna (which I do all the time in my GA)? Then we can go back to the last lesson on OR and just use a simple pipe:
Robbin|Luna
(Often, even people who know me well misspell my name, so I could use what I learned in lesson V, question marks, to make the second “b” optional, like this: Robb?in|Luna)
In Google Analytics (I won’t speak to other languages) we don’t need to use any parenthesis if there isn’t any grouping — the pipe can stand on its own. Or as Justin always tells me, keep it simple.
[Incredibly techie addition: My last comment about never needing parenthesis when there is nothing outside the parenthesis is not always true. At the eMetrics Summit, Nick from Google and Justin from Epikone taught me a lot about creating custom filters and during that process, explained how parenthesis define a variable. I will revisit this topic later.]
Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead
Robbin
View Comments (4 Responses) | Categories: Google Analytics, Regular Expressions
Posted on October 4, 2006 by Robbin Steif
This is the sixth installment of my Regular Expression lessons. I am actually learning more than teaching and just sharing as I go along. These are Regular Expressions for Regular People (c), so all the tech-talk is removed. My motivation for learning RegEx, as they are called, is Google Analytics.
OR gets symbolized by the pipe symbol |. The pipe symbol is on my US keyword just above the Enter Key but for some reason looks like two vertical dashes on the keyword itself. It’s incredibly simple, and even Google Analytics does a fabulous job with its description:

This was a hard one to screw up, although they have done a good job of screwing up other easy Regular Expressions.
Here’s an example. One of the sites that I work with is an engineering education site, and they teach both Statics and Finite Element Analysis (FEA). Some people also refer to the latter as FEM (Finite Element Method). If I were only interested in the statics keyword searches, I would probably want to get references to FEA and Finite Element Analysis and FEM out of the search reports. I could do that easily in GA by going to the little filter box that can be found on each report, making it into a red minus sign (so that I am filtering out) and typing in FEA|Finite|FEM . This has the effect of saying, “Get rid of references to anything that includes FEA OR Finite OR FEM.”
Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead
Robbin
View Comments (1 Response) | Categories: Google Analytics, Regular Expressions
Posted on September 25, 2006 by Robbin Steif
So I continue here with my Regular Expressions (“RegEx”) lessons. I am learning RegEx only because so many customers use Google Analytics, which throws the code at the customer with very little explanation.
This next lesson is about the question mark. This time, Google does a pretty good job of meaning what they say:
? Match zero or one of the previous expression
When they say, “The previous expression,” they mean, the character that comes right before the question mark. Since that is still pretty opaque, let me shine some light here.
Let’s say that you have an economics website and you only want to look at the referrers that have the word “labor” in their title. But some of those referrers come from non-US countries where they spell it “labour.” You could create a filter like this: labou?r
That way, it will match “labour” (which does have a “u,” which is the previous expression) and labor (which has zero of the previous expression, i.e. no “u” is included.)
You cannot use it like this : labo?r, or at least, not for the same purpose. It’s not a wildcard that you stick in between the o and the r to match any letter. The only matches would be to “labor” (zero of the previous expression) and “labr” (Thanks Serge for catching my error.)
Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead
Robbin
View Comments (2 Responses) | Categories: Google Analytics, Regular Expressions
Posted on September 14, 2006 by Robbin Steif
As regular readers know, I am learning about and sharing my lessons on Regular Expressions. In my last post on this topic, Part III, I wrote about the opening anchor, aka the carat symbol. It’s used at the beginning of some regular expressions. Today I am writing about its sister, this anchor $, which is sometimes used at the end of Regular Expressions (RegExen. I wish I had one of those “cool” smileys to easily insert here.)
In the Google Analytics very unhelpful Help section on Regular Expressions, it says this about the dollar sign:
$: match to the end of the field
What they really mean is, don’t match if the string from my website has any characters beyond where I have placed the dollar sign in my Regular Expression. The dollar sign signals all the characters that I want to match to. (This is the hard part about the explanation, it’s always hard to explain what the target is and what the RegEx is.)
So let’s say that you have some pages that end in htm and others in html. You want to write a Google Analytics Step 1 (part of a goal) for your email sign-up form, but you only want the new .htm version. Your RegEx might look like this:
/email-form\.htm$
The dollar sign tells the Google Analytics, if the page on your site has anything after the final “m” in “htm,” it doesn’t count as a match to this expression. Notice that I also used a backslash before the dot so that Google Analytics interprets it just as a dot, not as anything special.
I understood what it does pretty quickly, but I had to put this post off for at least a week in an effort to understand why anyone would ever use a dollar sign. Here’s the problem: if you have any campaign code whatsoever that gets attached to this string when someone lands on your site, there is no match. For example, let’s say someone lands on your site and when he looks up at the address bar, it says this:
/email-form.htm?cid=123
As soon as there is campaign code, the dollar sign anchor will tell GA that there is no match. And it can be hard to remember everywhere that you would have campaign code and everywhere that you wouldn’t.
So forget the fanciful examples – why would anyone ever use it?
Once place you might use it would be with an IP address (for a filter, someone that you are trying to filter in or filter out.) You might have an IP addresss like this that you are screening out: 12\.34\.56\.78, which matches 12.34.56.78, but you want to be sure that it doesn’t match 12.34.56.789 — so you set up your expression to be 12\.34\.56\.78$ . And if you want to be sure that it doesn’t match 512.34.56.78 as well, you should use the beginning anchor ^ (how’s that for a reminder and a link back to the last part of this series), ^12\.34\.56\.78$
You should also be able to use it, Justin says, with the “on the fly” filter that you can do with every report (the little box near the top of every screen.) For example, if I wanted to know how many different variations of search terms end with my company’s name, I should be able to type in LunaMetrics$ in that little box on one of the search term reports (like Marketing Optimization > Search Engine Marketing > Overall Keyword conversion, and only see search terms like LunaMetrics or Call LunaMetrics or who the heck is LunaMetrics). I have never gotten that one to work. I do see potential there, though, because I can use a carat^ at the beginning of an in-report filter, like this: ^LunaMetrics, and I see all the search terms that start only with my company name.
And you can use it in the profile filters.
Am hoping one of the GA gurus will get on to say why the $ anchor doesn’t work in the in-report filters. After all, I am only taking lessons here, and sharing them with the world.
Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead
Robbin
View Comments (No Responses) | Categories: Google Analytics, Regular Expressions
Posted on September 10, 2006 by Robbin Steif
This is the third in a series of lessons I am taking (and sharing) on Regular Expressions. This one is on the use of the anchor, symbolized by a carat, like this: ^. My tutor, Steve, writes about the dollar sign as well; I will handle that in a future post.
(Useful factoid: the people who work with Regular Expressions all the time call them RegEx. I have no idea how they make that plural.)
<
Here is what Google Analytics’ incredibly opaque Help says about the carat anchor:
^ — Match to the beginning of the field
I really understood every individual word in that sentence, I just couldn’t understand what they mean all strung together. (So I have a personal tutor.)
Here is what it means:
^ — If anything comes before this character, the string is not a match to this Regular Expression
For example, let’s say that I have two pages on my website, http://www.mysite.com/secondpage/contact/, and http://www.mysite.com/contact/.
Usually, Google Analytics, which is where I use RegEx (RegExes? REs?), perceives those two pages to be called /secondpage/contact/ and /contact/. That’s because GA already knows about the domain, www.mysite.com, and usually only cares about it if I have a subdomain (and have added the code, a technicality we won’t deal with.)
If I want to find all the strings that start with /contact/ (the second option) but just put in that same line, /contact/ for my Regular Expression, I will get everything that can possibly match the string, which will include the one I don’t want, /secondpage/contact/. This is something that has taken me a while to understand with Regular Expressions — they match everything that they possibly can, so you have to use the special characters to keep them from getting out of control.
If I only want to match http://www.mysite.com/contact/, I can use Regular Expressions like this:
^/contact/
That’s it. That’s how you use the anchor. And now, you are done. Everything after this is a clarification of one big nagging question that I had: Why would anybody use an anchor carat anywhere except here:
^http://www.mysite.com/etcetera/andso-on.php
Answers:
1) GA already thinks in terms of relative urls. It assumes the http://www.mysite.com, so when you ask for ^/contact/, it will come back and correctly show you strings that say /contact/, and you are usually saying to your boss, “They mean www.mysite.com/contact/.”
2) Anchor carats are useful in other places besides just urls. Let’s say you want to create a filter for the entire range of IP addresses in your company. However, your IP addresses all start with a two digit number, like 64.xx.xx.xxx, so you wouldn’t want to filter out something that looked like this: 164.xx.xx.xx. To solve that problem, you can use a carat: ^64 etc
Not sure if your regular expression will match the string you need it to? Use this handy tool from Epikone. You put in the string you want to match to and then the regular expression your wrote, hit enter and see how well you did. Many thanks to Justin (from Epikone) for help with this post.
Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead
Robbin
View Comments (7 Responses) | Categories: Google Analytics, Regular Expressions
Posted on August 24, 2006 by Robbin Steif
Web analytics can be a lonely field. It’s awesome to go to the eMetrics summit or other conferences and meet the people you correspond with all the time, but that only happens a couple of times a year. And how often do you get to sit down with another analyst and say, “Look at my stuff. What mistakes am I making?”
That was one of the reasons that I took ROI’s webinar – it includes a free GA audit. (The other reason was that I wanted to learn advanced GA, now that I have so many customers using them.) Today I had my audit with Michael Harrison, and it was so wonderful to actually go through the goals and filters and get advice. Once he even said, “It looks great. You’ve got it all set up right.”
The best part was when we talked about my regular expressions, which look like this:
/secondpage/register/
He asked, “You realize that this includes everything before the first slash and everything after the last slash?” I only answered, “Yes,” because his wife has been in labor for days now, and he was doing me a favor and calling me from his home. If he had had more time, I would have told him how I learned this from my Regular Expressions Tutorial.
Robbin
LunaMetrics
View Comments (No Responses) | Categories: Google Analytics, Regular Expressions
Posted on August 23, 2006 by Robbin Steif
This is part II of my journey into Regular Expressions for Google Analytics, whereby I am learning them (they are abbreviated as RegEx, or maybe in the plural, RegExen) and teaching them at the same time. I have rewritten this old post to include only the dot, like the one at the end of this sentence. This is to make the post easier and create building blocks for future posts.
Google Analytics says this about dots:
. matches any one character
This is exactly what they mean, but it is so out of context, I couldn’t wrap my head around it. (Match any one character that comes from where? I asked myself…)
They mean that you can create a RegEx like this
.ate
and it will match hate, fate, sate, or any four character expression. For that matter, it will match 8ate (there were no rules saying that the character has to be a letter.) It won’t match just ate, because it wants one character to substitute for the dot.
This is why we don’t (usually) ask Google Analytics to match a regular expression that looks like this:
homepage.com
because the dot is a wild card that stands for any one character (right?), so this will also match homepagescom and homepage4com and homepagedcom. Instead, we need to use a backslash to turn the Regular Expression dot into a Plain Old dot. (This is a good time to read Regular Expressions Part I, backslashes, if you haven’t already.) Anyway, we would express it like this: homepage\.com.
And that’s why you see backslashes and dots together so often.
Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead
Robbin
View Comments (1 Response) | Categories: Google Analytics, Regular Expressions
Posted on August 13, 2006 by Robbin Steif
What are Google Analytics’ “Regular” Expressions?
This was the question that I asked in August 2006. Although this post is ostensibly from August, I am actually rewriting it in October. Now that I understand Regular Expressions for Google Analytics, I want to explain them in the easiest language possible (so I had to go back and rewrite.)

The most basic expression is \ the backslash. Google Analytics ascribes this meaning to it:
\ escape any of the above
What they mean is, you can use a backslash to turn any special character into a not-so-special character. Google (and everyone else who talks about Regular Expressions) makes this hard by using the word “escape,” when they merely mean, use a backslash to take the magic out of a special character and make it an everyday character.
Although the backslash can be used with any special character, I see it used most often with a dot. This is because a dot is both a special character (see Part II), and one that is used with the Internet all the time (Example: www.myspace.com — we see it there twice.) On the Internet (and so, with Google Analytics) we almost always are using dots as regular dots and so need a backslash to keep it as a mere dot. Here’s an example: mysite\.com and here’s another one (this time, an IP address): 64\.68\.82\.164
Many thanks to my tutor in Australia, Steve. With his help, help from Justin Cutroni, and many hours of reading and rereading the Wikipedia page on Regular Expressions (I won’t even link there, it is so difficult), I learned Regular Expressions. Very late comment: This started as a mere question for me. This ended with a seventeen part series.
Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead
Robbin Steif
View Comments (5 Responses) | Categories: Google Analytics, Regular Expressions