<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Regular Expressions for GA, Bonus II: Minimal Matching</title>
	<atom:link href="http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/</link>
	<description>LunaMetric's blog on conversion rate and web analytics</description>
	<pubDate>Tue, 07 Oct 2008 06:21:34 +0000</pubDate>
	<generator>http://wordpress.org/</generator>
		<item>
		<title>By: steve</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-712</link>
		<dc:creator>steve</dc:creator>
		<pubDate>Thu, 02 Aug 2007 21:54:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-712</guid>
		<description>"Steve, you couldnâ€™t know this but Alan is quite the RegEx expert"
Ha! Translation: You've been "Trying to teach grandma how to suck eggs". Oops. :-)

My apologies Alan. Please take my prior in the positive light of good intent, and not as the ravings of a pompous windbag. No matter how accurate the latter may be. ;-)

Cheers!
- Steve</description>
		<content:encoded><![CDATA[<p>&#8220;Steve, you couldnâ€™t know this but Alan is quite the RegEx expert&#8221;<br />
Ha! Translation: You&#8217;ve been &#8220;Trying to teach grandma how to suck eggs&#8221;. Oops. <img src='http://www.lunametrics.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>My apologies Alan. Please take my prior in the positive light of good intent, and not as the ravings of a pompous windbag. No matter how accurate the latter may be. <img src='http://www.lunametrics.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>Cheers!<br />
- Steve</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robbin</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-710</link>
		<dc:creator>Robbin</dc:creator>
		<pubDate>Thu, 02 Aug 2007 11:33:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-710</guid>
		<description>Steve, you couldn't know this but  &lt;b&gt;Alan&lt;/b&gt; is quite the RegEx expert (and as you point out, we always find new things about RegEx.) Also -- Steve, we wouldn't want to try to match to the greedy expression and then to the less greedy expression. That would usually defeat our project. It might be okay if we were matching for keywords, but if you were rewriting uris, it would be pretty important that you do it right.

So let Google worry that these are slightly slow. People used to say, don't use a star * because it slows down the processing. And does anyone listen? No. And their processing is just fine (well, if it isn't fine, it's not because of their choice of RegEx...)</description>
		<content:encoded><![CDATA[<p>Steve, you couldn&#8217;t know this but  <b>Alan</b> is quite the RegEx expert (and as you point out, we always find new things about RegEx.) Also &#8212; Steve, we wouldn&#8217;t want to try to match to the greedy expression and then to the less greedy expression. That would usually defeat our project. It might be okay if we were matching for keywords, but if you were rewriting uris, it would be pretty important that you do it right.</p>
<p>So let Google worry that these are slightly slow. People used to say, don&#8217;t use a star * because it slows down the processing. And does anyone listen? No. And their processing is just fine (well, if it isn&#8217;t fine, it&#8217;s not because of their choice of RegEx&#8230;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: steve</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-709</link>
		<dc:creator>steve</dc:creator>
		<pubDate>Thu, 02 Aug 2007 10:07:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-709</guid>
		<description>To prove the adage: "No matter how much you know about a topic, there's always something new to learn": As I replied privately to Robbin this morning ("Subtle. Really Subtle," was the lead in. ;-) ), I only found out about this style of RegEx myself late last year or early this. I've been doing RegEx's since '88!
A friend in Austria (ie. next to Germany) used one to solve a funky problem we had. I was like "How on earth does that work? And what's with the funky '.*?' construct!?!!?!?"

Chapter 4 of the O'Reilly "Mastering Regular Expressions" (MRE) has a ... deep explanation of these "lazy" expressions as Jeff refers to them. "Greedy" or "Lazy". Sigh. IT people and the art of *bad* punning. ;-)


Alan, lazy expressions are, in my experience, generally slower than greedy ones. I have a program that does a lot of matching using the actual PCRE library[1]. Lots of '[^ ]+ ' (left_square not space right_square plus space) style of thing. Replacing with '.+? ' (dot plus question space) to get the equivalent lazy?
Go from ~ 80,000 lines/sec to ~65,000 lines/sec. If you read on to Chapter 6 in MRE, Jeff explains the whys and hows of this observed slowdown.
Without going into the detail, lazy expressions can cause the underlying engine to do more work. Of course, by "tomorrow" there may be new and improved optimisations that render the previous statements incorrect. ;-) [2]


Now as far as GA is concerned? Super speedy regex's aren't something we really need to worry about. Google do, we don't. Yet. ;-) In that situation, I recommend going for what you find easiest to *understand*.

Do be aware that: ".*?/" vs '[^/]*' will give different answers! They are not identical as written. What you probably want to compare with is: '[^/]*/'.

Cheers!
- Steve

[1] www.pcre.org. Has links to the Perl RegEx docco. Which could be a bit ... hairy for many readers. And very perl specific.
[2] The optimisation we use is to have two regex's. One, using greedy, is for most of the work. If that fails, we switch to the slightly more complex, plus lazy usage to try and match a 2nd time. Thus we get the best of both worlds. Speed and Correctness. It's possible that Justin (who knows way more about GA ins and outs than me) may know of a way to expose and use such a double hit filter????</description>
		<content:encoded><![CDATA[<p>To prove the adage: &#8220;No matter how much you know about a topic, there&#8217;s always something new to learn&#8221;: As I replied privately to Robbin this morning (&#8221;Subtle. Really Subtle,&#8221; was the lead in. <img src='http://www.lunametrics.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> ), I only found out about this style of RegEx myself late last year or early this. I&#8217;ve been doing RegEx&#8217;s since &#8216;88!<br />
A friend in Austria (ie. next to Germany) used one to solve a funky problem we had. I was like &#8220;How on earth does that work? And what&#8217;s with the funky &#8216;.*?&#8217; construct!?!!?!?&#8221;</p>
<p>Chapter 4 of the O&#8217;Reilly &#8220;Mastering Regular Expressions&#8221; (MRE) has a &#8230; deep explanation of these &#8220;lazy&#8221; expressions as Jeff refers to them. &#8220;Greedy&#8221; or &#8220;Lazy&#8221;. Sigh. IT people and the art of *bad* punning. <img src='http://www.lunametrics.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>Alan, lazy expressions are, in my experience, generally slower than greedy ones. I have a program that does a lot of matching using the actual PCRE library[1]. Lots of &#8216;[^ ]+ &#8216; (left_square not space right_square plus space) style of thing. Replacing with &#8216;.+? &#8216; (dot plus question space) to get the equivalent lazy?<br />
Go from ~ 80,000 lines/sec to ~65,000 lines/sec. If you read on to Chapter 6 in MRE, Jeff explains the whys and hows of this observed slowdown.<br />
Without going into the detail, lazy expressions can cause the underlying engine to do more work. Of course, by &#8220;tomorrow&#8221; there may be new and improved optimisations that render the previous statements incorrect. <img src='http://www.lunametrics.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> [2]</p>
<p>Now as far as GA is concerned? Super speedy regex&#8217;s aren&#8217;t something we really need to worry about. Google do, we don&#8217;t. Yet. <img src='http://www.lunametrics.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> In that situation, I recommend going for what you find easiest to *understand*.</p>
<p>Do be aware that: &#8220;.*?/&#8221; vs &#8216;[^/]*&#8217; will give different answers! They are not identical as written. What you probably want to compare with is: &#8216;[^/]*/&#8217;.</p>
<p>Cheers!<br />
- Steve</p>
<p>[1] <a href="http://www.pcre.org" rel="nofollow">http://www.pcre.org</a>. Has links to the Perl RegEx docco. Which could be a bit &#8230; hairy for many readers. And very perl specific.<br />
[2] The optimisation we use is to have two regex&#8217;s. One, using greedy, is for most of the work. If that fails, we switch to the slightly more complex, plus lazy usage to try and match a 2nd time. Thus we get the best of both worlds. Speed and Correctness. It&#8217;s possible that Justin (who knows way more about GA ins and outs than me) may know of a way to expose and use such a double hit filter????</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robbin</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-708</link>
		<dc:creator>Robbin</dc:creator>
		<pubDate>Wed, 01 Aug 2007 11:33:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-708</guid>
		<description>Hi Allen! It's good to know you are still there, riding the Paris Metro. 

This is because Google uses Perl Compatible Regular Expressions, (PCRE). PCRE actually have other capabilities, but I haven't taken the time to explore them, just this one. I look to a reader in Australia, Steve, to make those comments and keep me in line when it comes to RegEx. So maybe we'll hear from him.</description>
		<content:encoded><![CDATA[<p>Hi Allen! It&#8217;s good to know you are still there, riding the Paris Metro. </p>
<p>This is because Google uses Perl Compatible Regular Expressions, (PCRE). PCRE actually have other capabilities, but I haven&#8217;t taken the time to explore them, just this one. I look to a reader in Australia, Steve, to make those comments and keep me in line when it comes to RegEx. So maybe we&#8217;ll hear from him.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alan</title>
		<link>http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-707</link>
		<dc:creator>Alan</dc:creator>
		<pubDate>Wed, 01 Aug 2007 04:31:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.lunametrics.com/blog/2007/07/29/regular-expressions-for-ga-bonus-ii-minimal-matching/#comment-707</guid>
		<description>Hey Robbin,

Hope you're keeping well\ Sorry I've been really bad at keeping in touch. When I read this post in the Paris Metro this morning I thought now might as good a time as any to show a sign of life again :)

This is a really cool RegEx. I had no idea this worked. I would normally have matched to the first slash by creating a range and excluding the slash like so: [^/]*
However, it's definitely easier your way!

Thanks and take care!
Alan</description>
		<content:encoded><![CDATA[<p>Hey Robbin,</p>
<p>Hope you&#8217;re keeping well\ Sorry I&#8217;ve been really bad at keeping in touch. When I read this post in the Paris Metro this morning I thought now might as good a time as any to show a sign of life again <img src='http://www.lunametrics.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>This is a really cool RegEx. I had no idea this worked. I would normally have matched to the first slash by creating a range and excluding the slash like so: [^/]*<br />
However, it&#8217;s definitely easier your way!</p>
<p>Thanks and take care!<br />
Alan</p>
]]></content:encoded>
	</item>
</channel>
</rss>
