Upcoming LunaMetrics Seminars
Washington DC, Sep 22-26 Boston, Oct 6-10 Chicago, Oct 20-24 Seattle, Nov 3-7

Regular Expressions Part VIII: [Square Brackets] and Dashes -





Come learn Regular Expressions for Google Analytics with me. I am learning Regular Expressions for Google Analytics and teaching with each lesson. This is why I roll them out slowly – each expression requires a lot of research. I have been awed at this process because the explanations are so opaque before I understand them, and once I learn them, they make perfect sense. Tonight, let’s talk about square brackets, and I hope you’ll see what I mean.

regular-expressions

Google Analytics defines square brackets like this:

[] Match one item in this list

This is exactly what they mean, it just sounds hard because they don’t tell you how to create the list and how to define an item. Simple explanation: When you use square brackets, each character within the bracket is an item. Look at this sample list with five items in it, each of which happens to be a vowel: [aeiou]. The hard part is undertanding that you don’t need anything to separate the characters, and that each item in the list is only one character.

Here’s how someone might use square brackets with Google Analytics. Let’s say you were selling items with part numbers formatted like this: PART1, PART2, etc. You want to know how often someone lands on your site by typing the actual part number into a search engine, but you only care about PART3, PART5 and PART7. So, you could enter PART[357] into the fiter box on the top of your Overall Keyword Conversion report (for example). That will match each of those part numbers. (Technically, it matchest one of these three and more, but I will hold that problem/opportunity for a different post.)

It’s helpful to understand dashes so that you can use square brackets easily. Google Analytics defines dashes like this:

- Create a range in a list

That means, instead of creating a list like this [abcdefghijkl], you can create it like this: [a-l], and it means the same thing — only one letter out of the list gets matched. You can also combine the range method and the brute force, type-them-all-in method and create a list like this: [a-lqtz], which matches any one letter between a and l, or q, or t, or z.

Special case: Sometimes — perhaps often — we really want the dash to be one of the characters we are searching for. Maybe we want to see searches of luna-metrics and lunarmetrics and lunammetrics. In that case, we put the dash at the beginning or end of the list, like this [-rm]. That means that the full RegEx which would match the three lunametrics keywords above would be luna[-rm]metrics. This is because the phrase will start with luna, end with metrics, and in between will have a dash, an r, or an m. Those are the only choices in the little list I created, the one that looked like this: [-rm].

There are other interesting things that you can do with square brackets, but I am leaving them out for now, either because they don’t all work with Google Analytics, or because I think this is enough for today. (Correct me if I’m wrong!)

Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead

Robbin

Robbin Steif

About Robbin Steif

Our owner and CEO, Robbin Steif, started LunaMetrics ten years ago. She is a graduate of Harvard College and the Harvard Business School, and has served on the Board of Directors for the Digital Analytics Association. Robbin is a recent winner of a BusinessWomen First award, as well as a Diamond Award for business leadership.

http://www.lunametrics.com/blog/2006/10/22/regular-expressions-part-viii-square-brackets-and-dashes/

6 Responses to “Regular Expressions Part VIII: [Square Brackets] and Dashes -”

Travholt says:

Well, I think that you should mention the use of a carat within square brackets, because they’re extremely useful for collecting characters within known delimiters. For example, to get just the search term in this url: http://www.mysite.com/search?term=foo&order=desc you’d match against term=([^&]*)& meaning “get every character following “term=”, but stop when an ampersand is encountered.

(Unless this doesn’t work in GA, that is … Which I’m currently awaiting data for in my test profile …)

Robbin says:

I think you need to do it like this:

term=([^&]*)

(No ampersand at the end, else it matches to foo& and then you have to go clean up all those & marks. But, it looks like you are a RegExpert, and I just learn them so that I can do my job, so definitely – show me that I’m wrong.)

A few important thoughts:
1) You don’t have to wait for data in your test profile, you can use the RegEx coach, or much easier, the Epikone tool.
http://www.epikone.com/tools/regular-expression-filter-tester

2) I always learn things from readers like you!! I wish you would write out the whole logic here, so that everyone can learn this better. You can do it in a comment, and then I will go into the post and encourage people to read your comment. Or you can send me email, to my last name at my company name, and I will still edit the post and give you credit if you want.

Travholt says:

Gah. I forgot the spam protection and my post disappeared into the great void upon pressing the back button. And it was such a good one! Well, here goes again, although I might never be able to match my previous attempt’s level of wits …

In RegExes, when you’ve made a match, you can access different parts of your matched data in different ways. For example, you can make a RegEx like this:

kung((foo) and (bar))barians

This would match the string “I like kungfoo and barbarians too!”. The whole RE would match “kungfoo and barbarians”, BUT the parentheses help you get at the interesting bits (like picking choc chips out of cookies!) easily, or, more correctly, make sure you’re getting the choc chips from inside cookies instead of inside the trash bin.

In languages like Perl, which excels at REs, you do this through variables called $1, $2 and so on. The numbers correspond to the order in which the opening parentheses are found in the RE. GA has to distinguish between two different REs, so it calls these variables $A1, $A2 etc. for the first RE, and $B1, $B2 etc. for the second.

For the above example, $A1 would equal “foo and bar”, while $A2 would contain “foo” and $A3 would be “bar”.

I thought I’d keep the discussion here in the comments, as I am by no means an expert and others might be able to contribute too. But I love REs, and have been using them with great success and much saving of time or accomplishment of the otherwise practically impossible!

Champ says:

Does the hyphen not have to be escaped in your luna-metrics example?

Webmaster says:

Champ — the dash has to be escaped inside square brackets unless it is at the beginning or the end. Here it was at the beginning, so it didn’t define a range. You can read our whole regex book at http://www.lunametrics.com/regex-book/index.html

johncarney says:

have your own facebook theme here free :http://smal.ly/facebookthemes