412.343.3692
1.800.975.1844

Archive for August, 2007

It’s here! Google Analytics Shortcuts

Friday, August 31st, 2007

After I wrote about Justin Cutroni’s upcoming GA Shortcuts books, people wrote and said, “So? When is it going to come out?”

I just (we’re talking, 60 seconds ago) found out that it is now available. You don’t have to salivate anymore, go to the O’Reilly Site and buy his book for $9.99. And it even looks like you can buy it in hard copy for $29.99. And you can write a review before I get to, because it is 2:36 pm and I have four deadlines by 5 pm.

Plus, if I wrote a review, I would point out that the slashes go in the wrong direction in the figure that shows you how to create a filter to combine your hostname and request URI. And who needs me always criticizing their RegEx?

Justin — congratulations. I know how hard you worked, how many nights you stayed up, how many weekends were workdays for you, how you had to make your family go on vacation without you so that you could finish it. The web analytics world will be a better place because of you. Certainly, those of us who care about GA documentation are in heaven.

Speaking of which: to the two winners of the Google Analytics Documentation Contest - I will get your copies to you before this long weekend is over.

Robbin

Avinash answers my conversion questions: Part 4 of 4

Wednesday, August 29th, 2007

When I was in Hawaii (I bet you didn’t expect me to start that way), I cut my foot on the coral reef, my first day there. This was because Justin Cutroni insisted that I go visit the North Shore of Oahu. So instead of running around, I spent a week sitting under the palm trees and read Web Analytics: An Hour a Day. But I had all these questions, and the author, Avinash Kaushik, answered them for me! Here is the fourth part of this incredibly detailed set of answers to my questions.  There are only three more questions here, but instead of just providing clarification, there are some big new thoughts and resources. You can read my thoughts in boldface and the author’s in quiet type.

Read Part 1
Read Part 2
Read Part 3

Why do you care so much about the customer experience and discount conversion rate so much? (We can say, p. 340, but you address this elsewhere too) The way that I look at it, there are either other conversions (like applying for a job, or getting help on the website), and the analyst is just forgetting to include those conversions. Or, it’s important that the customer have a good experience so that when he is ready to buy, he will (and it is a long term problem, but it is still about conversion rate.) Or, he will tell other people or write about what a good experience he had, and *they* will come and convert, eventually. So it is still a conversion rate problem. Ultimately, it is always about conversion rate. (Go ahead. Tell me that I’m wrong.)

I had written the above answer before I read this question! :-)

Let me first say that I think you and I are defining conversion differently.

No matter what kind of site you have it is extremely likely that people come to your website for a very diverse set of reasons. Even on an ecommerce website people are there to buy, research, read the company’s bio, check order status / inventory, submit a review, bitch about something, look for support, find your address, whatever else is possible on earth.

If you accept that fundamental premise (and if you don’t just do a one question survey on your website and ask the visitors: “why are you visiting our website today”), then you’ll agree that if you want to make everyone on your website happy then focusing just on improving conversion (“sell”) is solving for just a minority of the site traffic. It also means that perhaps you are telling all other visitors to take a hike.

You do want to make money on your ecommerce website, you do want to figure out how to improve the conversion rate (orders/unique visitors). But that can’t possibly be your life’s mission, not even on a ecommerce website.

You need to figure out how to carry all other types of visitors with you and help them complete their tasks.

Ok here is the controversial part: You are a consultant and LunaMetrics is a very good consulting company. You have conversion rate as your middle name. If you get hired then you are probably supposed to simply improve the conversion rate. If you want to get paid, and rehired, then you have to solve that problem, and not care about any other type of visitor. I suspect even if you were of a very generous heart you can’t afford to care about any other type of visitor, you are being paid to sell more. That’s ok for you.

But I hope that companies realize that sell, sell and figure out how to sell more is not a long term strategic choice. They need to identify all the reasons visitors come to their site (“Primary Purpose”) and help them all (improve “Task Completion Rate”).

p. 312. IMO, there is no way to get competitive conversion data outside of panel data. Am I wrong? (Go ahead, you don’t have to be nice.)

You can get it from ComScore (in case you did not mean that by panel data).

As I mention above you can also use the FireClick Index, they even break it out for new and returning visitors! And for the last 12 months!! And for six different industry verticals!!! :-) Compare trends over time with the index and it will give you a great feel for how things are going for you.

You can also sign up for the delightful shop.org ecommerce / conversion report, many people think of that as the bible.

Finally, yesterday I got an email from Stephane Lagrange and I noticed on his blog, http://blog.webtarget.ca, he has referenced the Top 500 Guide published by InternetRetailer.com which also publishes conversion rates for top ecommerce websites. Here are some of the numbers, directly copied from Stephane’s blog:

#1 Amazon.com: 3.52%
#2 Staples.com: 9.62%
#3 Office Depot: 7.10%
#498 Broadspan Commerce LLC: 0.35%
#499 Musicnotes Inc: 3.25%
#500 KneeDraggers.com: 0.99%

Thank you for helping, and of course, for writing your great book. I loved every piece of it, except for the Six Sigma part. (That was way too dry for me, which is a shame, since that’s an area I know almost nothing about.) On page 286, you wrote, “It is amazing what people won’t tell you even in the most open and honest company environments, because they are just trying to be nice.” You clearly didn’t have me in mind when you wrote that line….

You are underestimating the value of what you bring to the table. Under any circumstance I know exactly where you stand and what your opinion is. Sometimes it might hurt to hear the truth, but it is always better to hear the truth. You are honest, direct and willing. It makes for a refreshing change in a world where one is always trying to parse nuances and syllables to understand where the other person stands. I am glad that you have the courage to share your real opinion and you don’t have a hidden agenda (and if you have one then you are doing very well hiding it! ).

Thanks for the opportunity to do this interview, you had great questions and it was fun to answer them.

Avinash answers my Hour a Day questions: Part 3 of 4

Sunday, August 26th, 2007

After reading Web Analytics: An Hour a Day, I had a lot of questions, and the author was kind enough to answer them all. In this third installment, we talk about testing and just begin to talk about conversion rate. My questions are in bold and Avinash’s answers are indented.

Read Part 1
Read Part 2
Read Part 4

When you wrote about usability (p. 53), you commented, “Usability tests are best for optimizing UI designs and work flows, understanding the voice of the customer, and understanding what customers really do.” However, I do usability testing all the time. During testing, I learn about the offer and the price, I learn how much the customers trust the site, I learn if the customer understands the site. A whole lot more than usability. So what things is user testing *not* good for (besides statistical significance, and some would disagree with even that)??

My comment you quote stresses what Usability is really optimal for. It can, as you aptly point out, be used for a number of wonderful things and can be a rich source of learning.

With advent of various technologies (including live recruiting and remote testing, experimentation and testing) you have such a wonderful set of tools that you can deploy. For example I prefer to do offer experimentation using a multivariate or testing tool rather than usability. Offer is cleanly tied to a outcome (say conversion), so why should I ask eight people who might not be really representative of my customers what they think? I can just as easily throw an experiment on the site and ask a million people on my site what they think.

Lab usability testing is valuable. It is perhaps the only way to see a customer and observe them intimately. Look for non verbal cues and reactions. Applied for the right purposes it can be a rich source of learning.

It can also be extremely deceptive to ask 50 people what they think of your site / experience / offers and assume that you have it nailed. If that were true site redesigns based on extensive usability tests would not bomb with the frequency that they do.

Why does experience testing get you any close to a global maxima (p. 248)? At the end of the day, you still need to know what to test.

Let me say this first, in any scenario you need to have a very intimate understanding of your customer experience.

Customers overall are very good at telling your their problems, they are terrible at telling you the solutions (and that is quite ok, never ask a customer for a solution).

To solve complex problems on a higher magnitude where your solutions will “slash and burn” what exists today you have a great friend in experience testing. Rather than just optimizing a page, you can optimize huge chunks of the customer experience, if not the whole site, by trying radical solutions and seeing which works. The nice thing is you set participation rates which means that you can easily control for risk.

Experience testing helps you jump the curve (to a get on the global maxima curve potentially) because your canvas is so much bigger, you can take bigger more radical risks and win big.

With most testing your optimize a page, when was the last time that you or I ever had a website experience were one page was so golden that it had a disproportionate impact on the outcome. Probably not a lot.

How do people set conversion rate (or other) goals? It’s great if the CEO says, “We have to increase our sales from our web channel by 50%” — then you can just run the numbers. But absent direction from someone else, do people just say, “Hmm, wouldn’t it be great if we could increase our conversions by 12.45%?” Do they pull out their HP 12C calculators and do an internal rate of return based on the cost of testing and the cost of money? (p. 256)

 

Here is my recommendation…..

1) A: Sign up for the shop.org annual study and look at what your competitors are doing. Use that as a initial discussion starter of what your conversion rate should be.

1) B: Type “fireclick index” into google and look at last year’s worth of data for conversion rate for the web or for one of the six vertical industries that they provide. It is free. Use that as a starting point for discussion of what your goal should be.

2) Plot out your conversion rates (segmented by your core acquisition strategies - DM, Email, PPC, Display, whatever) for the last year and see where things are trending. Bring this to your fireclick/shop.org discussion.

3) Finally see where in your acquisition strategy or site optimization you are making increased investments. If you just hired a SEM Goddess pump up the goal by 50% for that stream of traffic (Goddess will deliver). If you are implementing MVT then see what that will do.

1 + 2 + 3 = An intelligent discussion.

You’ll come up with a goal for the next three months. It might be wrong but persist and repeat the process three months later, you’ll do better this time. In six months when you do it you’ll nail it.

Give yourself permission to be wrong, trust me you’ll get better so fast.

Coming next: Part 4, where Avinash continues to talk about my favorite topic, conversion rate.

Robbin

Princess Angela tells all: What took so long?

Thursday, August 23rd, 2007

We interrupt our regularly scheduled programming (which was Part 3 of the Avinash Kaushik book addendum) to hear from Princess Angela Brown. The princess (who points out that she is of relatively minor lineage) is also co-chair of the Standards Committee for the Web Analytics Association. Today, her Committee announced their 26 Standards (finally), at Search Engine Strategies - San Jose. We bring you here the exclusive inside story, also known as, “What took so long to write 26 definitions?”

Today the WAA Standards Committee released its second definitions document; eight months after our “Big Three” definitions were released. Eight months seems a ridiculously long time to define 23 new terms (the “big 3″ were carried over from our previous doc). After all, I could have written this document myself in under eight hours. Jason, our co-chair, could easily have done the same. In fact, nearly any of our committee members — more than twenty very competent people working as web analytics consultants, practitioners, and vendors — could have written this document in a day, blindfolded, with one hand tied behind their back and balancing on one foot (yes, I have a lot of confidence in our committee members!). What took so long?

Web analytics is not rocket science. Rocket science uses far more Greek letters and squiggly things than even the most complicated web analytics problem. The questions web analytics sets out to answer are really quite simple: WHO came to our site? WHEN did they visit? WHERE did they come from? WHAT did they do? HOW did they do it? Coupled with good marketing research and/or usability tests, you can even get a good idea of WHY your visitors do what they do.

As I see it, there are two issues that make web analytics harder than it looks. First, even though the concepts are simple, proper execution can be complex. To get a lot of value from web analytics, you really need to segment the WHAT by the WHO, and the HOW by the WHY, and the WHERE by the WHEN by the WHAT by the WHO. Second, a lot of our existing terminology comes from the tools we use. That’s not awful in and of itself: there are a lot of very good web analytics tools out there, and all of them deal with essentially the same concepts. But going from one tool to another is like learning another language, and no matter how well you know your stuff you are bound to misinterpret something because similar terms are used to describe different concepts. To use a cliché, the devil is in the details, and it’s that devil that took up so much of our last eight months.

We are fortunate to have a wide variety of Standards Committee members who have experience using different tools and analyzing different types of websites. This has led to some lively discussion about the meaning of the terms that so many of us use every day, and has underscored our industry’s need for precise terminology. For example, do you know the difference between a repeat visitor and a return visitor? How about a landing page versus an entry page? Single page visit versus bounce? Visit versus session? (The last one’s a trick question.)

For answers to these questions and more, download our document from the WAA site. We welcome your feedback.

In addition to her work as a guest blogger and her royal responsibilities, Angela Brown is the Web Analytics Manager for the MD Consult site at Elsevier. She has also been known to work for a large web analytics software vendor as a professional services consultant.

 

 

Avinash answers my questions about his book: Part 2 of 4

Thursday, August 23rd, 2007

As I explained in Part I of this series, after I read Web Analytics: An Hour a Day, I had a lot of questions (and even some things I didn’t agree with.) So I wrote the author, and he sent me back nine pages of thoughts. That’s why I’m chunking my interview with Avinash into sections. Unlike the first part of this series, this one is very down in the weeds; I asked about some very specific best practices. You can see my questions below in boldface and his answers in quiet type, perfectly matching our personalities.

Moving from the very general to the very specific: On page 33, I scribbled, “First party cookies don’t talk to each other, and third party cookies get deleted.” Do you have any recommendations on choice of (vendor? technology) that deals with both of these issues? A first party cookie solution where the various sites in the enterprise talk to each other without lots of manual coding? A solution that you love?

In my prior role we had implemented first party cookies and “first party third party” cookies to overcome this challenge somewhat.

As an example let’s say my company was ZQ Insights and it had two sites www.zqinsights.com and www.webanalyticshour.com. I want to track each by itself and also the two pulled together.

I set a first party cookie on each (www.zqinsights.com and www.webanalyticshour.com). I am happy so far.

Now I also set a “first party third party” cookie on both, let’s call it tracking.zqinsight.com. The latter cookie I can use if I was looking at both sites as one monolith (to for example get true unique visitors).

It is less likely that this cookie will get blown away by spyware (because it is not being set from known domains of web analytics vendors), though high security settings will still be a issue.

One last point, you have to have a web analytics tool that allows you to create “local” (site specific) and “global” (all sites) datasets with ease and mix and merge sites. ClickTracks is one such tool.

I hope I have answered your question (and the answer is not clear as mud).

Question: On page 37, you stress the importance of having the analytics code at the bottom of the page (”Customer’s first.”) But what’s an analyst to do when the other fancy things on your page don’t work unless the tracking code loads before them?

Let me share some context.

The reason for the tracking code to last is simple: Nothing should interfere with the customer experience.

The page that the customer has requested has to go back as fast as possible so that they can get on with their life (and convert for example). Just in case you have something “funny” going on in the code, just in case your analytics providers servers are under heavy load, or just in case….. we want the customer to get the page first and us to get the data second.

There are always exceptions to any rule. I would set the bar really high to ensure that decisions to load the tag first pass rigorous scrutiny.

 

I think you do novice analysts a disservice by focusing so strongly on bounce rate. To you, bounce rate is about time on page. But most bounce rates are calculated as (visits entering and exiting on the same page without looking at another page) divided by (visits starting on that page.) Someone can bounce after spending 15 minutes reading the home page of this blog. So when you associate bounce with ways your website is failing (p. 145), the new analyst will be confused. I am not sure I have a question there, but you are welcome to respond.

For blogs my recommendation is that analysts should not measure either bounce rate or time on site. Both metrics will paint the wrong picture, precisely for the reasons you have so correctly identified.

Regardless of how it is computed for most types of websites Bounce Rate is a excellent metric that helps identify opportunities for improvement in acquisition strategies or website entry points.

On P. 274, you wrote that one of the questions you should be asking of your clickstream tool is, “What is the most influential content on the site? How do we know what convinced people to buy?” In general, how do most analysts figure that out, and specifically, how do you like to get that answer with Google Analytics?

I refer to a specific example of using the ClickTracks “funnel” report to identify influential content on your website. I am not aware of any other web analytics tool that can do that (or as easily as ClickTracks does), even if they all have “funnel” reports. It is something unique, and built into, ClickTracks.

If you have access to Discover2 or MarketingLab or your own data warehouse environment I suppose you can construct a complex query to replicate the ClickTracks logic. If you want to understand content influence you should so that, it is amazing what you’ll learn.

You can also use page level surveys (described in detail in the book) to understand value and influence of individual pieces of content on your page (and do it at scale).

 

 

Coming next - Avinash answers my questions about testing.

Part 1  Part 3  Part 4

Robbin

Google Analytics: Everything you always wanted to know

Tuesday, August 21st, 2007

justin.JPGJustin Cutroni will soon publish the Google Analytics Shortcuts book. And it will be the best ten bucks you will ever spend, if you care about advanced GA implementation. It’s hard to believe that the book is almost here - I remember standing in the lobby of a hotel somewhere in California just a few months ago, and Justin whispered, “I’m going to write a book about GA.”

I am not exactly impartial here. This is a book that I have read over and over and over again. I appointed myself Editor in Chief and rewrote parts of it. “Didn’t we fix this utmSetVar typo once already?” I wrote the author last week. When I read the penultimate (I hope) copy last week, I found out that this blog is in it. And did I mention that Justin is one of my best WA friends? Like I said, not impartial.

So I am a little like Bridget Jones. She loves Mark Darcy, even though his mother buys him awful gifts and she seriously believes he should rethink the length of his sideburns. I love the book, despite its imperfections.

Since this is a real review, let me discuss the imperfections. First, I think you need to be a pretty advanced GA user for Shortcuts. If you are already reading Justin’s blog, religiously, you have definitely taken a step in the right direction.

I think Justin goes to great pains to tell you why GA works the way it does, information that is badly needed. But I think he would be smarter to have put some of that in the appendix — it is just way too boring until you absolutely need it. (And then, of course, you are desperate for it.)

Periodically, Justin lapses into GA-speak. For example, he writes this about the Item line in the e-commerce hidden form: “There will be one item line for each distinct product purchased by the visitor. This usually means one item line per SKU or unique product ID.” When I read this I feel like I need to create the I: line 50,000 times if I have 50,000 SKUs. (And you don’t have to do that.) In a similar vein he says, if you have e-commerce tracking, you can just leave the goal value blank. But this drives users crazy, because there is no way to leave it blank - GA insists on zero.

In a couple of (very rare) instances, I think he is wrong. But remember, I got a chance to point out problems all along the way, and he didn’t correct them - so maybe I am wrong. I am mostly thinking about applying AdWords cost data — you really don’t have to apply it to all the Analytics profiles that are linked to that AdWords account, you can choose, even though he says you have to link to all, and GA says so too. (Or maybe I am the only person in the universe who is always able to make this choice when I set up AdWords and Analytics.) And I am thinking about his Count Me Out! hack, which works fabulously to take yourself out of the data - but he also has a workaround in the book that doesn’t work. He says, use Firefox, go to the website where you want to be counted out, type this into the address bar, javascript:__utmSetVar(‘foo’), and you will create a utmv (a user defined cookie) called foo for that site. But it never works. Maybe it’s just this blogger who doesn’t know how to do it? (OK, I figured this out. When Justin wrote the book, he did it in MS Word, and Word assigns “Smart Quotes.” That’s how it knows when to turn the quotes to the right and the left, even though you only have one key on your keyboard. Anyway those special characters were gunking up the works.)

And wouldn’t it be great if the .pdf used the power of html? So that when he says, “I’ll be covering that later in my section on…. ” you could just click to it? (Maybe that will be in the final version.)

So when I write that you should drop everything and then keep dropping, i.e. drop ten bucks on this e-book as soon as it is available, it is not because I am starry eyed. I do see little imperfections, but still…. It is an incredible resource, and no GA analyst should be without it. I sure wouldn’t want to work without it anymore. That’s one of the reasons that I wanted to give away two copies to winners of the GA contest - I knew it was the perfect gift. The one you don’t have but absolutely need.

So salivate. It will be here soon.

Robbin

Avinash answers my questions about his book: Part I

Monday, August 20th, 2007

Did you have questions when you finished Web Analytics: An Hour a Day? I did.

The book was truly amazing. But when I was done, I had written all over it. Sometimes my notes read, “This is awesome, we have to try it.” But sometimes they read, “I don’t understand.” And other times, they read, “I really disagree.”

So I got an interview with the author. It came to nine pages (count ‘em, 123456789), so I am going to reprint it in parts. Avinash, you are truly wonderful for devoting this much time.

So let’s get started. And in my usual “in your face” fashion, I’ll start with a question that most people wouldn’t ask the Guru of WA:

Please allow me a quick interruption. The word Guru is of Hindu (Indian) origin and having grown up in India I have to say that I do not consider myself a Guru. One has to meet an astoundingly high benchmark to get that title and I am very very far away from even the starting point of meeting that benchmark.

For more context on that word here’s the wonderful wikipedia [definition]

In the introduction - why do you write that your book is for everyone? Is it for my mother, who is retired and spends lots of time taking care of my father? Is it for my daughter — the one who can drill down in her Quicken, but refuses to do anything academic? No, of course not. But that’s what customers do. We ask them, “Who is your site for?” and they answer, “Everyone!” So – who is your book for?

You got me.

Perhaps that was overuse of the word everyone.

Here are the specific people / roles that are mentioned in the introduction of the book:

· Mr./Ms. Web Interested

· CEO

· C-level or VP-level or just No-level person

· Marketer

· Sales Person

· Web-Designer

· User Researcher

· Analyst

The introduction describes how the book will be helpful specifically for each role.

As an example, here’s the one for Web Designer: If you are a Web “Designer” then this book will share with you how you don’t have to compromise on the number of ideas you can put on the website to improve the site, that you can have all of your ideas (even the radical ones) go live on the site and measure which one solves the customer (or your company’s) problems most effectively.

 

Question: I love the idea of surveying, continuously. However, it has not worked out well for me or my customers. We figured out how to delete the pop-up blockers, only to find out that customers hated it. And shouldn’t it be “customers first?” Any advice about the best way to *administer* exit surveys?

In my experience surveys that are shown at the right time with the intent of allowing the customer to express their opinion go ok. Typically we cram so much into a survey that only we care about that a customer looks at it, pukes and exits.

If you ask customers “nicely” they want to tell you about their experience.

My advice:

1) Experiment with different invitation types (pop up, pop under, on exit etc) and see what your customers prefer. And you only have to do this a few days each to get a feel for it.

2) Start with small number of questions (remember the “golden questions” post?) and then expand.

3) Put a really large close button. Make it apparent, clearly, that the survey can be closed. In a very subliminal way it works very effectively and actually gets closed less (if you have done #1 and #2 above first!).

From a mindset perspective you want them to share with you what they think rather than do a quick little interrogation with a battery of questions. Fine balance. :)

While we are on surveys - how do you feel about surveys that force the visitor to answer certain or all of the questions? It is infuriating to me when I answer a BizRate survey for the chance to get a “free” magazine, and they force me to answer questions (so instead, I just lie.) Thoughts?

I skip it.

I also rarely do any other surveys. I often read them to study them from a knowledge / awareness perspective, but I don’t fill surveys.

Here is the thing. I am not the customer and it is irrelevant what I think about surveys.

The first time we did a survey it was 20 questions and I was positive it would bomb, after all who in God’s name has that much time. Turns out that it had a consistent 18% response rate (compared to an internet standard of 1% response rate for surveys).

My lesson was that I should try not to impose my views and opinions and check them in at the door. Because I am not the customer, no matter how much I think I am. It is a tough pill to swallow because we tend to think we are “experts” because we have so much knowledge and data.

Experiment, it is cheap, see what works and what does not, refine and try again.

Why do you have a Trinity? I really see a duality, clickstream data and qualitative data.

It is qualitative (Experience), clickstream (Behavior) and the third prong is Outcomes.

Some people mix clickstream with outcomes. I choose to break it out for two reasons:

1) I want people to outrageously focus on outcomes. It is easy to be hypnotized by all the clickstream data and reports and forget to set goals or measure in a very hard core way outcomes. Yet the thing that drives action is not all your clickstream analysis, it is the tie to outcomes.

2) I want people to think of outcomes more than conversion. I am not big on obsessing about conversion, which will almost always lead to solving for a minority of your site traffic. Outcomes are improved customer satisfaction numbers, increased task completion rates, increased depth of visit over time, problem resolution rates on support websites, better recency trends on non-ecommerce website.

By putting outcomes as a separate part of the Trinity I am trying to emphasize the importance of understanding outcomes, and different types of outcomes beyond just revenue/conversion.

Do you think that makes sense?

 

Notice that I didn’t answer his last question (”Does that make sense?”) — I hope some of you will.

Coming next: Part 2, where I ask Avinash about first and third party cookies, where to put the code on the page when you are torn between conflicting needs, and other really “down in the weeds” Q&A.

Part2   Part 3   Part 4

Robbin

 

Regular Expressions for GA, Bonus III: Lookahead

Wednesday, August 8th, 2007

2007_02230074.JPGI hope this will be the last installment, for some time to come, of my Regular Expressions (RegEx) for Google Analytics series. At the end of the post, I have finally threaded all the RegEx posts.

Tonight you can see some very cool (if you care) regular expressions for GA that look ahead and decide whether the match is allowed. Sort of, conditional match. There are two kinds of look aheads: negative lookahead (don’t match if) and positive lookahead (only match if …) Like braces, this is a ReGex that works in Google Analytics, but there is no documentation on it.

Here’s an example that I had today. I was working with a GA site where they have membership, not customers. The site owners use the word members in some of their URIs, and they also use membermail and memberthis and memberthat and membertheother. And a whole lot of other memberexamples.

So let’s say that I needed to create a filter with a regular expression to include all those member URIs in GA, but I want to make sure that I don’t include membermail, since mail is in a special category in my marketing mix. Now we’re in a position to formulate the question really well: how do we include all the Request URIs that have the string member in them where that string is not followed by the string mail? In other words — don’t match member if it is followed by mail.

Steve gets the credit for this one. He suggested that GA might work with negative lookahead - the ability to combine regular expressions to say, “don’t match if it is followed by…” In our membermail example, the expression would be member(?!mail) .

The opening of the parenthesis, followed by a question mark, tells the RegEx engine, “Watch out, lookahead coming.” The exclamation point says, “And it’s a negative.” Combined, they mean, it’s a negative lookahead - don’t match the first part of the string if the second part is there. Don’t match member if it is followed by mail .

GA also handles positive lookahead. So if we want to match only membermail and not memberthis or memberthat and all the other uris with member, we can write our RegEx like this: member(?=mail) – in this case the open paren and the question mark do the same thing (”Watch out, lookahead coming,”) but the equal sign says, “And it’s a positive match.”

There is one last little fine point to wrap your head around, assuming you are not dizzy already. The lookahead string, or whatever you want to call the string mail in my example, is not part of the match. I know this sounds like gibberish, so let me give a last example. This one’s for all you RegEx fans. And for everyone on the Paris metro:

Example: Let’s say that I am doing a positive lookahead like this: member(?=s)hip. This means, match to member only if it is followed by an s, and then please match to the hip, too. . However, the string membership would not be a match. That seems a little ridiculous. After all, it is member, followed by an s, followed by hip, right?

Well, it doesn’t work that way. That’s because the s is only a condition. In the eyes of the RegEx engine, you sort of have a conditional regex that looks like this:

memberhip (notice, no s)

And, you are trying to match to

membership

It’s not a match, because the s isn’t part of the RegEx. It was just being used as a condition.

OK, we are done for tonight, and

Late note: A reader here, Alan, wrote a wonderful comment, whereby he shows other ways to implement and use the power of positive lookahead. You should read his comment, but the short version is, you can use positive lookahead to match if the “condition” is somewhere down the line. The “lookahead string” — the condition — doesn’t have to *immediately* follow the (?!) if you use the syntax that he figured out. (See, now you have to read his stuff….)

Here is the thread with all the RegEx posts.

Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes -
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now we will Practice
Bad Greed
RegEx and Good Greed
Intro to RegEx
{Braces}
Minimal Matching

– Robbin