Upcoming LunaMetrics Seminars
Boston, Oct 6-10 Chicago, Oct 20-24 Seattle, Nov 3-7 New York City, Nov 17-21

Regular Expressions Part X: Stars *




This is Part X of the long long series I have been doing on Regular Expressions (RegEx) for Google Analytics. It is the last one I will do that explains what Google says vs. what they mean.

regular-expressions

When it comes to stars (or call them asterisks if you like), Google Analytic says this:

* Match zero or more of the previous items

Perfectly reasonable, if you know how to create a list of previous items. If you already read Post IX, use of the plus sign in RegEx, this will be easy, and if not, I’ll try to make it easy.

If the only special character you are using is the star *, then the previous item is defined as the previous character. For example, let’s say that my company has five digit part numbers, and I want to know how many people are searching for part number 34. The problem I have are all those leading zeros – technically, the part number is PN00034. So I could use the little Google Analytics filter box in my search report with a RegEx like this: PN0*34. That will bring me back all the searches for PN034 and PN0034 and PN00034 and PN00000034 and for that matter, PN34, since using the star means that the previous item doesn’t need to be in the search — zero or more of the previous items, it says.

Alternatively, we could build a list of previous items using square brackets. Like in my post on plus signs, I had a hard time finding a reason someone would want to use this, but again, used the example that Steve gave me. His example was square brackets with a space. So, I could do a search for my company name in the same filter box on the keywords report, like so:
Luna[ ]*metrics. That will come back with LunaMetrics (no use of the space) or Luna Metrics, or Luna Metrics, etc.

For the sake of completeness, I should point out that you can put real characters in the square brackets like this:b[aeiou]*d, and it matches bad and bed and bid and bod and bud. But for that matter, it matches baaaad and boud and bd, so I don’t think it is particularly useful. If I really just wanted to see those five examples (bad, bed, bid, bod and bud), I would be smarter to use the OR pipe | and do it like this: b(a|e|i|o|u)d.

Anyone who has a great example of using a star with square brackets is strongly encouraged to comment.

Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
{Braces}
Minimal Matching
Lookahead

Robbin

Robbin Steif

About Robbin Steif

Our owner and CEO, Robbin Steif, started LunaMetrics ten years ago. She is a graduate of Harvard College and the Harvard Business School, and has served on the Board of Directors for the Digital Analytics Association. Robbin is a recent winner of a BusinessWomen First award, as well as a Diamond Award for business leadership.

http://www.lunametrics.com/blog/2006/11/15/regular-expressions-part-x-stars/

7 Responses to “Regular Expressions Part X: Stars *”

Christopher says:

I appreciate your writing about RegEx! However, could you tell me something? Would this:
100\.100\.100\..*
be the same as this IP range:
100.100.100.0-255
?

Hi Christopher. It works but is overkill and might be slow. Since each set of numbers between the dots in an IP address is a number between 0 and 255 (and this we know from reading Steve’s blog) you don’t need to have such a wild expression at the end, in fact, you don’t need anything. The regular expressions are so greedy, they match everything they can unless you tell them not to. You can do it like this: 100\.100\.100\.

Anything after the last dot will get matched, but that is not a problem in your example.

Robbin

ps thanks for the vote of confidence, I sometimes think that besides Steve and Justin, I am the only person who is fascinated with this topic.

example of [] and *

/article/[a-zA-Z0-9\-_]*/

=Any a-z and 0-9 and – and _ (1 or more characters)

Would match any article name in common search safe urls

Steve says:

Hi

I think I have set this up right….
Basically I have a bunch of differently title video pages eg:

aero_q109_air_management_systems_video.html

where the ‘aero’ could be 4-5 variations but the ‘video’ at the end stays contstant.. so I came up with this to register a goal success in Analytics no matter which page the visitor lands on:

^/webinars/(oilgas*video|gen*video|auto*video|aero*video|power*video|marine*video)\.html$

Correct? comments?

cheers

Robbin Steif Robbin says:

Steve – How about just video\.html — or if you are worried that that will pick up other stuff that you don’t want, how about ^webinars/.*video\.html Remember that a star doesn’t get everything. It just matches the previous item multiple times. When combined with the dot, though, it becomes the wildest card.

Taryn East says:

There are huge numbers of useful things that you can match with star and square brackets.

eg to make something up at random
[0-9]+[a-zA-Z]*
is a part-number that must start with at least one digit, but can also have any number of letters after it.

these are the kinds of regexes that you can use to validate whether a particular string is a valid part-number.

You can do some basic validation of email addresses with:

[\w\d]*@[\w\d]+.[\w]{2}

(note – there are much better email-validating regexes out there, but this is a simple example)

The square-bracket match really comes into its own when you can start to construct “not” sets.

eg /articles/[^/]*
means it matches “/articles/” or anything that is strictly a sub-directory of “articles”… but nothing that is a sub-sub-directory (or lower).

Robbin Steif Robbin says:

Taryn, thank you for your comment. The first example does not require a plus sign in it. [0-9] already requires that one digit be included. Your thoughts about email validation are (no pun intended) extremely valid, thanks so much. I didn’t have the time to parse through your regex there, but am sure that others will…… Robbin