Regular Expressions Part III: Anchors ^/
September 10, 2006
This is the third in a series of lessons I am taking (and sharing) on Regular Expressions. This one is on the use of the anchor, symbolized by a carat, like this: ^. My tutor, Steve, writes about the dollar sign as well; I will handle that in a future post.
(Useful factoid: the people who work with Regular Expressions all the time call them RegEx. I have no idea how they make that plural.)
Here is what Google Analytics' incredibly opaque Help says about the carat anchor:
^ — Match to the beginning of the field
I really understood every individual word in that sentence, I just couldn’t understand what they mean all strung together. (So I have a personal tutor.)
Here is what it means:
^ — If anything comes before this character, the string is not a match to this Regular Expression
For example, let’s say that I have two pages on my website, http://www.mysite.com/secondpage/contact/, and http://www.mysite.com/contact/.
Usually, Google Analytics, which is where I use RegEx (RegExes? REs?), perceives those two pages to be called /secondpage/contact/ and /contact/. That’s because GA already knows about the domain, www.mysite.com, and usually only cares about it if I have a subdomain (and have added the code, a technicality we won’t deal with.)
If I want to find all the strings that start with /contact/ (the second option) but just put in that same line, /contact/ for my Regular Expression, I will get everything that can possibly match the string, which will include the one I don’t want, /secondpage/contact/. This is something that has taken me a while to understand with Regular Expressions — they match everything that they possibly can, so you have to use the special characters to keep them from getting out of control.
If I only want to match http://www.mysite.com/contact/, I can use Regular Expressions like this:
That’s it. That’s how you use the anchor. And now, you are done. Everything after this is a clarification of one big nagging question that I had: Why would anybody use an anchor carat anywhere except here:
1) GA already thinks in terms of relative urls. It assumes the http://www.mysite.com, so when you ask for ^/contact/, it will come back and correctly show you strings that say /contact/, and you are usually saying to your boss, “They mean www.mysite.com/contact/.”
2) Anchor carats are useful in other places besides just urls. Let’s say you want to create a filter for the entire range of IP addresses in your company. However, your IP addresses all start with a two digit number, like 64.xx.xx.xxx, so you wouldn’t want to filter out something that looked like this: 164.xx.xx.xx. To solve that problem, you can use a carat: ^64 etc
Not sure if your regular expression will match the string you need it to? Use this handy tool from Epikone. You put in the string you want to match to and then the regular expression your wrote, hit enter and see how well you did. Many thanks to Justin (from Epikone) for help with this post.
Dollars signs $
Question marks ?
Square brackets and dashes -
Plus signs +
Regular Expressions for Google Analytics: Now let’s Practice
RegEx and Good Greed