<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Regular Expressions for GA, Bonus II: Minimal Matching</title>
	<atom:link href="http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/</link>
	<description>Traffic, Analysis, Action</description>
	<lastBuildDate>Sun, 12 Feb 2012 00:37:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Jennifer</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/comment-page-1/#comment-1825</link>
		<dc:creator>Jennifer</dc:creator>
		<pubDate>Fri, 07 May 2010 01:46:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-1825</guid>
		<description>Ha, I came here expecting this to have something to do with baby bath. Turns out it was just the comment above by Robbin that must have triggered it.</description>
		<content:encoded><![CDATA[<p>Ha, I came here expecting this to have something to do with baby bath. Turns out it was just the comment above by Robbin that must have triggered it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robbin</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/comment-page-1/#comment-734</link>
		<dc:creator>Robbin</dc:creator>
		<pubDate>Wed, 03 Feb 2010 19:50:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-734</guid>
		<description>Alexander, ba will match ba and baby and bath and all sorts of other things. There are other ways to stop &quot;RegEx greed,&quot; like ba$, but I was just using that to illustrate a concept of minimal matching.

Robbin</description>
		<content:encoded><![CDATA[<p>Alexander, ba will match ba and baby and bath and all sorts of other things. There are other ways to stop &#8220;RegEx greed,&#8221; like ba$, but I was just using that to illustrate a concept of minimal matching.</p>
<p>Robbin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alexander</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/comment-page-1/#comment-733</link>
		<dc:creator>Alexander</dc:creator>
		<pubDate>Mon, 01 Feb 2010 05:44:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-733</guid>
		<description>Hello Allen, great article.

However, ba+? could be written much simpler to match this pattern.

why not use ba only that will match only one character of a after b. ba+? looks like it might work but confuses.

Is ba+? formulation needed because GA RegEx is different or is this your way of matching a pattern of ba ?

Thank you in advance for your reply.

Alexander</description>
		<content:encoded><![CDATA[<p>Hello Allen, great article.</p>
<p>However, ba+? could be written much simpler to match this pattern.</p>
<p>why not use ba only that will match only one character of a after b. ba+? looks like it might work but confuses.</p>
<p>Is ba+? formulation needed because GA RegEx is different or is this your way of matching a pattern of ba ?</p>
<p>Thank you in advance for your reply.</p>
<p>Alexander</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robbin</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/comment-page-1/#comment-732</link>
		<dc:creator>Robbin</dc:creator>
		<pubDate>Tue, 09 Dec 2008 16:33:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-732</guid>
		<description>Nuttakorn - you can exclude this page:

/Category/default\.aspx

That way, Category/Product1 (etc) will still get included. The exclude I did above is very specific and doesn&#039;t encompass the other pages that you want included.</description>
		<content:encoded><![CDATA[<p>Nuttakorn &#8211; you can exclude this page:</p>
<p>/Category/default\.aspx</p>
<p>That way, Category/Product1 (etc) will still get included. The exclude I did above is very specific and doesn&#8217;t encompass the other pages that you want included.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nuttakorn</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/comment-page-1/#comment-731</link>
		<dc:creator>Nuttakorn</dc:creator>
		<pubDate>Tue, 09 Dec 2008 10:54:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-731</guid>
		<description>What do you think if I would filter out some page like this case :

http://www.domainname.com/Category/default.aspx ----- Filter out this page

But keep track for all page under http://www.domainname.com/Category/Product1, http://www.domainname.com/Category/Product1 , etc.</description>
		<content:encoded><![CDATA[<p>What do you think if I would filter out some page like this case :</p>
<p><a href="http://www.domainname.com/Category/default.aspx" rel="nofollow">http://www.domainname.com/Category/default.aspx</a> &#8212;&#8211; Filter out this page</p>
<p>But keep track for all page under <a href="http://www.domainname.com/Category/Product1" rel="nofollow">http://www.domainname.com/Category/Product1</a>, <a href="http://www.domainname.com/Category/Product1" rel="nofollow">http://www.domainname.com/Category/Product1</a> , etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robbin</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/comment-page-1/#comment-730</link>
		<dc:creator>Robbin</dc:creator>
		<pubDate>Mon, 13 Oct 2008 16:29:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-730</guid>
		<description>Hi Edward. I am glad you like this one, it is my favorite. I go back to this article over and over again. Robbin</description>
		<content:encoded><![CDATA[<p>Hi Edward. I am glad you like this one, it is my favorite. I go back to this article over and over again. Robbin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Edward Beckett</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/comment-page-1/#comment-729</link>
		<dc:creator>Edward Beckett</dc:creator>
		<pubDate>Sun, 12 Oct 2008 16:47:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-729</guid>
		<description>This is  a great article ... I am a RegEx addict ... Every time I hear about someone doing something powerful with RegEx ... I get excited ... yes ... I am a RegEx geek ...

Nice post ...

Edward</description>
		<content:encoded><![CDATA[<p>This is  a great article &#8230; I am a RegEx addict &#8230; Every time I hear about someone doing something powerful with RegEx &#8230; I get excited &#8230; yes &#8230; I am a RegEx geek &#8230;</p>
<p>Nice post &#8230;</p>
<p>Edward</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: steve</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/comment-page-1/#comment-728</link>
		<dc:creator>steve</dc:creator>
		<pubDate>Thu, 02 Aug 2007 21:54:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-728</guid>
		<description>&quot;Steve, you couldnâ€™t know this but Alan is quite the RegEx expert&quot;
Ha! Translation: You&#039;ve been &quot;Trying to teach grandma how to suck eggs&quot;. Oops. :-)

My apologies Alan. Please take my prior in the positive light of good intent, and not as the ravings of a pompous windbag. No matter how accurate the latter may be. ;-)

Cheers!
- Steve</description>
		<content:encoded><![CDATA[<p>&#8220;Steve, you couldnâ€™t know this but Alan is quite the RegEx expert&#8221;<br />
Ha! Translation: You&#8217;ve been &#8220;Trying to teach grandma how to suck eggs&#8221;. Oops. <img src='http://www.lunametrics.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>My apologies Alan. Please take my prior in the positive light of good intent, and not as the ravings of a pompous windbag. No matter how accurate the latter may be. <img src='http://www.lunametrics.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>Cheers!<br />
- Steve</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robbin</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/comment-page-1/#comment-727</link>
		<dc:creator>Robbin</dc:creator>
		<pubDate>Thu, 02 Aug 2007 11:33:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-727</guid>
		<description>Steve, you couldn&#039;t know this but  &lt;b&gt;Alan&lt;/b&gt; is quite the RegEx expert (and as you point out, we always find new things about RegEx.) Also -- Steve, we wouldn&#039;t want to try to match to the greedy expression and then to the less greedy expression. That would usually defeat our project. It might be okay if we were matching for keywords, but if you were rewriting uris, it would be pretty important that you do it right.

So let Google worry that these are slightly slow. People used to say, don&#039;t use a star * because it slows down the processing. And does anyone listen? No. And their processing is just fine (well, if it isn&#039;t fine, it&#039;s not because of their choice of RegEx...)</description>
		<content:encoded><![CDATA[<p>Steve, you couldn&#8217;t know this but  <b>Alan</b> is quite the RegEx expert (and as you point out, we always find new things about RegEx.) Also &#8212; Steve, we wouldn&#8217;t want to try to match to the greedy expression and then to the less greedy expression. That would usually defeat our project. It might be okay if we were matching for keywords, but if you were rewriting uris, it would be pretty important that you do it right.</p>
<p>So let Google worry that these are slightly slow. People used to say, don&#8217;t use a star * because it slows down the processing. And does anyone listen? No. And their processing is just fine (well, if it isn&#8217;t fine, it&#8217;s not because of their choice of RegEx&#8230;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: steve</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/comment-page-1/#comment-726</link>
		<dc:creator>steve</dc:creator>
		<pubDate>Thu, 02 Aug 2007 10:07:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-726</guid>
		<description>To prove the adage: &quot;No matter how much you know about a topic, there&#039;s always something new to learn&quot;: As I replied privately to Robbin this morning (&quot;Subtle. Really Subtle,&quot; was the lead in. ;-) ), I only found out about this style of RegEx myself late last year or early this. I&#039;ve been doing RegEx&#039;s since &#039;88!
A friend in Austria (ie. next to Germany) used one to solve a funky problem we had. I was like &quot;How on earth does that work? And what&#039;s with the funky &#039;.*?&#039; construct!?!!?!?&quot;

Chapter 4 of the O&#039;Reilly &quot;Mastering Regular Expressions&quot; (MRE) has a ... deep explanation of these &quot;lazy&quot; expressions as Jeff refers to them. &quot;Greedy&quot; or &quot;Lazy&quot;. Sigh. IT people and the art of *bad* punning. ;-)


Alan, lazy expressions are, in my experience, generally slower than greedy ones. I have a program that does a lot of matching using the actual PCRE library[1]. Lots of &#039;[^ ]+ &#039; (left_square not space right_square plus space) style of thing. Replacing with &#039;.+? &#039; (dot plus question space) to get the equivalent lazy?
Go from ~ 80,000 lines/sec to ~65,000 lines/sec. If you read on to Chapter 6 in MRE, Jeff explains the whys and hows of this observed slowdown.
Without going into the detail, lazy expressions can cause the underlying engine to do more work. Of course, by &quot;tomorrow&quot; there may be new and improved optimisations that render the previous statements incorrect. ;-) [2]


Now as far as GA is concerned? Super speedy regex&#039;s aren&#039;t something we really need to worry about. Google do, we don&#039;t. Yet. ;-) In that situation, I recommend going for what you find easiest to *understand*.

Do be aware that: &quot;.*?/&quot; vs &#039;[^/]*&#039; will give different answers! They are not identical as written. What you probably want to compare with is: &#039;[^/]*/&#039;.

Cheers!
- Steve

[1] www.pcre.org. Has links to the Perl RegEx docco. Which could be a bit ... hairy for many readers. And very perl specific.
[2] The optimisation we use is to have two regex&#039;s. One, using greedy, is for most of the work. If that fails, we switch to the slightly more complex, plus lazy usage to try and match a 2nd time. Thus we get the best of both worlds. Speed and Correctness. It&#039;s possible that Justin (who knows way more about GA ins and outs than me) may know of a way to expose and use such a double hit filter????</description>
		<content:encoded><![CDATA[<p>To prove the adage: &#8220;No matter how much you know about a topic, there&#8217;s always something new to learn&#8221;: As I replied privately to Robbin this morning (&#8220;Subtle. Really Subtle,&#8221; was the lead in. <img src='http://www.lunametrics.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  ), I only found out about this style of RegEx myself late last year or early this. I&#8217;ve been doing RegEx&#8217;s since &#8217;88!<br />
A friend in Austria (ie. next to Germany) used one to solve a funky problem we had. I was like &#8220;How on earth does that work? And what&#8217;s with the funky &#8216;.*?&#8217; construct!?!!?!?&#8221;</p>
<p>Chapter 4 of the O&#8217;Reilly &#8220;Mastering Regular Expressions&#8221; (MRE) has a &#8230; deep explanation of these &#8220;lazy&#8221; expressions as Jeff refers to them. &#8220;Greedy&#8221; or &#8220;Lazy&#8221;. Sigh. IT people and the art of *bad* punning. <img src='http://www.lunametrics.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>Alan, lazy expressions are, in my experience, generally slower than greedy ones. I have a program that does a lot of matching using the actual PCRE library[1]. Lots of &#8216;[^ ]+ &#8216; (left_square not space right_square plus space) style of thing. Replacing with &#8216;.+? &#8216; (dot plus question space) to get the equivalent lazy?<br />
Go from ~ 80,000 lines/sec to ~65,000 lines/sec. If you read on to Chapter 6 in MRE, Jeff explains the whys and hows of this observed slowdown.<br />
Without going into the detail, lazy expressions can cause the underlying engine to do more work. Of course, by &#8220;tomorrow&#8221; there may be new and improved optimisations that render the previous statements incorrect. <img src='http://www.lunametrics.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  [2]</p>
<p>Now as far as GA is concerned? Super speedy regex&#8217;s aren&#8217;t something we really need to worry about. Google do, we don&#8217;t. Yet. <img src='http://www.lunametrics.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  In that situation, I recommend going for what you find easiest to *understand*.</p>
<p>Do be aware that: &#8220;.*?/&#8221; vs &#8216;[^/]*&#8217; will give different answers! They are not identical as written. What you probably want to compare with is: &#8216;[^/]*/&#8217;.</p>
<p>Cheers!<br />
- Steve</p>
<p>[1] <a href="http://www.pcre.org" rel="nofollow">http://www.pcre.org</a>. Has links to the Perl RegEx docco. Which could be a bit &#8230; hairy for many readers. And very perl specific.<br />
[2] The optimisation we use is to have two regex&#8217;s. One, using greedy, is for most of the work. If that fails, we switch to the slightly more complex, plus lazy usage to try and match a 2nd time. Thus we get the best of both worlds. Speed and Correctness. It&#8217;s possible that Justin (who knows way more about GA ins and outs than me) may know of a way to expose and use such a double hit filter????</p>
]]></content:encoded>
	</item>
</channel>
</rss>

