Upcoming LunaMetrics Seminars
Boston, Oct 6-10 Chicago, Oct 20-24 Seattle, Nov 3-7 New York City, Nov 17-21

Regular Expressions Part III: Anchors ^





This is the third in a series of lessons I am taking (and sharing) on Regular Expressions. This one is on the use of the anchor, symbolized by a carat, like this: ^. My tutor, Steve, writes about the dollar sign as well; I will handle that in a future post.

(Useful factoid: the people who work with Regular Expressions all the time call them RegEx. I have no idea how they make that plural.)

regular-expressions<

Here is what Google Analytics’ incredibly opaque Help says about the carat anchor:

^ — Match to the beginning of the field

I really understood every individual word in that sentence, I just couldn’t understand what they mean all strung together. (So I have a personal tutor.)

Here is what it means:

^ — If anything comes before this character, the string is not a match to this Regular Expression

For example, let’s say that I have two pages on my website, http://www.mysite.com/secondpage/contact/, and http://www.mysite.com/contact/.

Usually, Google Analytics, which is where I use RegEx (RegExes? REs?), perceives those two pages to be called /secondpage/contact/ and /contact/. That’s because GA already knows about the domain, www.mysite.com, and usually only cares about it if I have a subdomain (and have added the code, a technicality we won’t deal with.)

If I want to find all the strings that start with /contact/ (the second option) but just put in that same line, /contact/ for my Regular Expression, I will get everything that can possibly match the string, which will include the one I don’t want, /secondpage/contact/. This is something that has taken me a while to understand with Regular Expressions — they match everything that they possibly can, so you have to use the special characters to keep them from getting out of control.

If I only want to match http://www.mysite.com/contact/, I can use Regular Expressions like this:

^/contact/

That’s it. That’s how you use the anchor. And now, you are done. Everything after this is a clarification of one big nagging question that I had: Why would anybody use an anchor carat anywhere except here:
^http://www.mysite.com/etcetera/andso-on.php

Answers:
1) GA already thinks in terms of relative urls. It assumes the http://www.mysite.com, so when you ask for ^/contact/, it will come back and correctly show you strings that say /contact/, and you are usually saying to your boss, “They mean www.mysite.com/contact/.”

2) Anchor carats are useful in other places besides just urls. Let’s say you want to create a filter for the entire range of IP addresses in your company. However, your IP addresses all start with a two digit number, like 64.xx.xx.xxx, so you wouldn’t want to filter out something that looked like this: 164.xx.xx.xx. To solve that problem, you can use a carat: ^64 etc

Not sure if your regular expression will match the string you need it to? Use this handy tool from Epikone. You put in the string you want to match to and then the regular expression your wrote, hit enter and see how well you did. Many thanks to Justin (from Epikone) for help with this post.

Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead

Robbin

Robbin Steif

About Robbin Steif

Our owner and CEO, Robbin Steif, started LunaMetrics ten years ago. She is a graduate of Harvard College and the Harvard Business School, and has served on the Board of Directors for the Digital Analytics Association. Robbin is a recent winner of a BusinessWomen First award, as well as a Diamond Award for business leadership.

http://www.lunametrics.com/blog/2006/09/10/regular-expressions-part-iii-anchors/

7 Responses to “Regular Expressions Part III: Anchors ^”

Anonymous says:

Plural?
For me it depends on context. :-)

With one good friend (now a linux kernel hacker by day), I use “RE’s”.
At work I tend to use “RegEx’s”. When communicating with those who would not necessarilly appreciate or understand the shortening, I spell it out in full: “Regular Expressions”.

Naturally the prior use of the apostrophe is not to indicate ownership or annoy English majors, rather to forcibly separate the ‘s’ from the previous “word” and highlight the use of the plural form.

Additionally, as a single Regular Expression is frequently made up of multiple Regular Expressions; such a separation can be more descriptive of the intent.

Bad writing style, good email clarity.


Wikipedia offers:
regexps, regexes, or regexen

Tho I think the latter is purely to help the speaker sound educated. :-)


Confused?

:-)

- Steve

Oh, RegExen, I really like that one. Definitely makes me sound like I understand this weird field.

Robbin

Justin Cutroni says:

Hey Robbin,

Great post! I haven’t read the full series but plan to. I’m not sure if you’ve mentioned this, but there is a great RegEx testing tool. The RegEx Coach:

http://weitz.de/regex-coach/

Of course I prefer the tool on the EpikOne site, but I’m partial ;)

Justin

Anonymous says:

so then, http://www.mysite.com\contact\me\ would also show when using ^\contact\ …. do I get it?

Enjoy these posts. Thanks for doing this.

Laura

steve says:

Laura: If you meant:

^\contact\

???? Then yes. Congrats! :-) Just want to ensure you didn’t mean to include the … as part of the regular expression.

Just be aware that if you have “\contact\you\” and didn’t want that to match, then your expression won’t work. Regular Expressions are not as simple as just what will match, but also ensuring you don’t get too much. Or don’t match too much.

Cheers!
- Steve
I do so enjoy having access to a comments rss feed. :-)

[...] Regular Expressions Part III: Anchors ^ | Increasing your (Useful factoid: the people who work with Regular Expressions all the time call them RegEx. I have no idea how they make that plural.) [...]

George Black says:

Hi Robbin

Great series on regular expressions.

Just thought you would like to know that the two links to your tutor’s blog go to a 404 page :(

Cheers
George