Upcoming LunaMetrics Seminars
Seattle, Nov 3-7 New York City, Nov 17-21 Washington DC, Dec 1-5 Los Angeles - Anaheim, Dec 8-12

RegEx ebook Answers – for Google Analytics

Regular Expressions eBookA couple months ago, I published a RegEx for Google Analytics ebook. You can download the ebook or just “page through it” online . The last page was a quiz, and I promised the answers — here they are:

  • Question 1: Write a Regular Expression that matches both dialog and dialogue.
  • Answer:  Since you should always keep your RegEx simple, my best answer would  be dialog|dialogue . Slightly more elegant but more complicated would be dialog(ue)? I’ll just invite others here (and for every other answer) to submit alternative ideas.
  • Question 2: Write a RegEx that matchest two request URLs:  /secondfolder/?pid=123 and /secondfolder/?pid=567 (and cannot match other URLs)
  • Answer: ^/secondfolder/\?pid=(123|567)$ — note, I chose to put a  dollar sign at the end so that the regex stopped matching if anything came after the three digits at the end of the target strings.
  • Question 3: Write a single Regular Expression that matches all your subdomains (and doesn’t match anything else). Make it as short as possible, since Google Analytics sometimes limits the number of characters you can use in a filter. Your subdomains are subdomain1.mysite.com, subdomain2.mysite.com, subdomain3.mysite.com, subdomain4.mysite.com, www.mysite.com, store.mysite.com and blog.mysite.com
  • Answer: There are lots of ways to do this. My choice would be: (subdomain[1-4]|www|store|blog)\.mysite\.com$
  • Question 4: Write a funnel and goal that includes three steps for the funnel and the final goal step (four steps in all), using Regular Expressions. Notice that there are two ways to achieve Step 2.
  1. /store/
  2. /store/creditcard or store/moneyorder
  3. /store/shipping
  4. /store/thankyou

Answer: This one requires a picture:

The RegEx for Step 2 is hard to read in the picture above, it is /store/(creditcard|moneyorder)$ . Notice that I added a dollar sign at the end of all the expressions above. Without context, it is hard to know which ones are mandatory, but we know for sure that the dollar sign in Step 1 is mandatory.  That dollar sign is required, not because of regular expressions, but because of the strange way funnels sometimes work. For more information on that one, you might want to read this excellent post on goals and funnels that Jonathan Weber of LunaMetrics wrote.

Finally, let me end with a great quote!  (I didn’t ask permission to use her name, so I will just use an initial.) This reader wrote me yesterday to ask, where were the answers? Here is part of what she wrote:

Seriously, your ebook was so full of win. I’ve been trying to wrap my mind around RegEx and have been using the basics in GA for a while. But I really expanded my understanding b/c your explanations were so plain English and applicable to GA. A lot of tutorials I saw online weren’t written for GA and were written by propellerheads, I suspect. :)

Having worked for a graphics publishing company, I also REALLY appreciated the layout, font treatments, and graphics. A world-class job all the way around. — A

– Robbin

Robbin Steif

About Robbin Steif

Our owner and CEO, Robbin Steif, started LunaMetrics ten years ago. She is a graduate of Harvard College and the Harvard Business School, and has served on the Board of Directors for the Digital Analytics Association. Robbin is a recent winner of a BusinessWomen First award, as well as a Diamond Award for business leadership.

http://www.lunametrics.com/blog/2010/07/27/regex-ebook-answers-write/

16 Responses to “RegEx ebook Answers – for Google Analytics”

Jonathan Weber Jonathan says:

I’m going to suggest that for Question #1, an even simpler regular expression to match both “dialog” and “dialogue” is simply… “dialog”. :) There’s relatively little danger of matching any other word we don’t want, and in situations like this, usually we’re looking for keywords or something like that, so “dialogs” and other variations are OK (desirable, in fact).

Johann says:

Thanks for the RegExp book. Being a programmer, I already knew RegExps but I’ll forward this to a colleague who doesn’t (yet). :-)

Steve says:

So I’ll be a touch pedantic. :-)
Q3. I actually had this down as a trick question, as it asks for *all* subdomains, and then goes on to list them. Where ‘them’ equals the current set of subdomains/names.
I’d have gone for the straight: “\.mysite\.com$”
The plus side being that it will automatically include new domains when they arise without any effort to add them.
The downside is the exact same reason. eg testing-developers.mysite.com probably shouldn’t be included, but will with my proposed.
I guess this is one of those classic ‘it depends’ cases. Where knowing the org and it’s history around web properties is very useful. :-)

Have I managed to sit astride the fence nicely?
Cheers!

Robbin Steif Robbin says:

It’s a good answer. And I have nothing but thanks for your teacherly-capabilities!

Your posts are awesome, I hope you are considering writing a book on this stuff. I’m adding you to my RSS list. Bravo.

Searchengineman

Robbin Steif Robbin says:

I thought this was a book! Seriously — I did want to write a book, but didn’t look hard enough for a publisher. thanks for the vote of confidence — Robbin

Don F. says:

Robbin, your ebook on RegEx is so darn helpful and easy to read. Thank you so very much for putting this together and offering it to the public. I wish you great success and happiness!

-Don

Robbin Steif Robbin says:

Don, comments like yours are absolutely smile-worthy. thanks – R

Bani says:

Great book!
I’m having some trouble to make a regex to add to my funnel though, maybe you could help me?
I’d like to include all pages that contain the word checkout, but not success or cart. For instance, /checkout/step1 should match, but /checkout/step1/cart shouldn’t.
I’ve tried .*/checkout[^(success|cart)]* on the filter but I still see the cart and success pages on the results.

Robbin Steif Robbin says:

Bani, you are looking for something that has been disabled in Google Analytics, it is called “negative lookahead.” Your funnel won’t work until you actually change the URLs (makes me think I should change the Google Analytics Friendly web design post to handle your issue, too…..)

I don’t know that much about your site, but why don’t you try Exact Match instead of regular expression match? That should do it, but might be too specific for your needs.

GroovyKarma says:

Your ebook is fantastic! I struggle with regular expressions and your ebook presented in a clear, concise and fun fashion. RegEx just became clearer for me. I am glad you put such Art into the Universe. Bravo :-)

Pashmina says:

Question about the first comment, where Jonathan suggests using “dialog”. Does this work? This would suggest that URLs written in the funnel default to “Head Match” if there is no “$” to prevent it.

If the URLs in the funnel aren’t automatically head matches, how do I write a Regex so that it IS a head match? eg. If I have these URLS:
/orderform.html
/orderform_1.html
/orderform_2.html
/orderform_3.html
(continue all numbers)
What is the correct expression to capture all pages in the funnel step?

Robbin says:

Hi Pashmina. Sorry that this took a couple days to get back to you. So, you really asked a couple of questions rolled into one:

1. Do URLS in the funnel default to Head Match if there is no $?
2. Can “dialog” match both “dialog” and “dialogue” ?

The answer to #1 is no, there is no default behavior the way you suggest. When you have a URL goal, you are required to tell GA whether you want head match, RegEx match or exact match. So the GA always knows what kind of match you want.

The answer to #2 is that Jonathan’s way is perfectly valid, but not as specific as the answer that I originally gave. Notice that he writes, “There’s relatively little danger of matching anything else.” Regular Expressions are *greedy* and they will match as much as they can. I recommend that you read this article on RegEx and greed: http://www.lunametrics.com/blog/2007/01/11/regular-expressions-part-xiii-good-greed/

Koen says:

Almost three years after writing your ebook and you are still making people happy with it. Even on the other side of the pond!

/thanks(alot)?

Robbin Steif Robbin Steif says:

Glad you enjoyed – R