Upcoming LunaMetrics Seminars
Pittsburgh, Jan 12-16 Boston, Jan 12-16 New York City, Jan 26-30 Denver, Feb 9-13

Filters for GA, Part 5: Now let’s practice

I have been working on this series, Filters for Google Analytics, for almost six months now. This is the last part of the series I am going to write (at least in the foreseeable future.) You can get the full thread at the bottom of this post.

So once again, I will start with the question a lady asked me on Friday morning: how does one create exclude filters that function as if they were AND filters? How, she asked, can I create a profile and put on it two different exclude filters that work together: Exclude the visit if the visitor is a new visitor AND her language is German? We can’t do that with two separate exclude filters, because as soon as the first filter works, the visit gets thrown out. Exclude filters are, by nature, “or” filters.

(If you are a little lost, and don’t remember how multiple exclude filters work like OR filters and multiple include filters working like AND filters, and how you have to work to make them do the opposite — see this post that I wrote recently.)

Well, I had to think about this one, and then realized that it was just a great practice for this series on filters for GA. The reason why: It is both a rewrite (custom advanced filters) and an exclude (custom filter). Here is the overview: you rewrite the two fields into one field, and then you do an exclude on that one field, making sure that it matches both pieces of data.

If that was clear as mud, or you are more of a step-by-step “please show me” person (like I am) — then this is a good place to keep reading.

First, you use a custom advanced filter to rewrite the data so both pieces of info — the visitor language and visitor type, which is GA-Speak for “new” vs “returning” visitor — are in the same field. Like this:
language-type-filter.jpg

Now we have both pieces of data in the same field. If we could look at them, they would look like this, concatenated together on the same line.

1. en-us/New Visitor
2. de/Returning Visitor
3. da/New Visitor

and so on. They would be the language code followed by a slash and the visitor type. (Right? That’s the way I set it up, $A1 captures everything in visitor language settings, then a slash, then $B1 captures everything in Visitor Type.) Above, I choose three languages for my examples: English-US (en-us), German (that’s de) and Danish (da).

Next, we create an exclude filter. The hypothetical example was to exclude a new visitor whose language is German – the two excludes that must work together. This is the heart of our problem, exclude two different kinds of variables. So here is our exclude:exclude-language-type-filter.jpg

So what does this say? It say, go to Custom Field 1, where we now have a list of concatenated languages and visitors types (en/Returning Visitor, and so forth). If one of those lines in the list matches de (which stands for German – Deutch, right?) and has a slash and then the word new, it’s a match, so please exclude it.

And that’s how you can exclude two different fields in Google Analytics at the same time.

If you would like to read all the other posts in this series:

Robbin Steif

About Robbin Steif

Our owner and CEO, Robbin Steif, started LunaMetrics ten years ago. She is a graduate of Harvard College and the Harvard Business School, and has served on the Board of Directors for the Digital Analytics Association. Robbin is a recent winner of a BusinessWomen First award, as well as a Diamond Award for business leadership.

http://www.lunametrics.com/blog/2007/10/21/filters-for-ga-part-5-now-lets-practice/

24 Responses to “Filters for GA, Part 5: Now let’s practice”

steve says:

To throw my 2c in. :-)
Do be careful of the full language, or language tag:
http://en.wikipedia.org/wiki/Language_localization#Language_tags_and_codes

The full list of codes is from ISO 639-1:
http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

The main one I frequently come across is Portugal Portuguese vs Brazilian Portuguese.
ie: pt_PT vs pt_BR

Heh. And obviously en_AU. :-)

I do get a few German speaking visitors, from both Germany and Austria. And GA reports seeing both:
de AND de-de
Can’t quickly find any de_AT style….

So in this case the filter pattern may be better as:
de([-_]..)?/New

Which will (hopefully…) cover:
de ; de_XX ; de-XX
Where XX can be any regional/country variation.

Just for correctness. :-)

HTH & Cheers!
– Steve

[…] Filters for GA, Part 5: Now let’s practice (Lunametrics) […]

[…] Filters for GA, Part 5: Now let’s practice (Lunametrics) […]

colorado_gumi says:

A followed a link here from a question about how to exclude a non-unique City name… Lexington in the USA…

This technique was suggested in the response to the question and seemingly would work EXCEPT when creating a “Custom filter” with the “Advanced” option, the drop down list does not include the visitor’s geographic city and region as options to select.

The necessary combined field for the filter apparently cannot be created with the selection of fields provided?

What I want to do is exclude my city from the tracking results with a filter and the “IP” exclusion solution is not viable for me.

Robbin says:

Why don’t you jut do Custom (and not Advanced) and choose Exclude and choose visitor city and type in Lexington (since this is so easy, I am sure that I am not understand the question. Sorry.)

colorado_gumi says:

As with lots of towns and cities in the US, the name Lexington is not unique. Columbus and Auburn are other examples; even Boston. It must be qualified further by a state name.

Robbin says:

Yes, I was definitely not listening hard enough. Let me see what I can find out for you. It is a great point. Let me see what I can find out.

steve says:

Is it possible to do a two step filter:
Part 1 excludes everything NOT from Lexington
Part 2 excludes everything NOT from STATE_XYZ

Exploit the natural … AND’ness of the exclude???
Could be very wrong.

Cheers!

Robbin says:

You mean include. Include filters are naturally AND filters. And I bet your idea would work, especially because I think he wants to screen OUT a city. (Colorado Gumi, you should try it. I think the Regular Expression is (!Lexington), but you should use a testing tool or just get Steve’s wisdom.)

Now, maybe if someone wanted to only include Lexington, KY, they could include everything that is NOT NOT from Lexington? You’re the RegEx King, Steve, can we support nested NOT statements?

colorado_gumi says:

I really don’t know how two separate include filters would work together, but from what I read (and using Lexington, KY as an example) I suspect one would get a first filter solution of all visitors that are Lexington (including those in KY) that is further screened by the second filter to all visitors that are Lexington AND KY. That’s the opposite of what I want, which is everything that is NOT (Lexington AND KY). I haven’t seen a way to then process that second step solution of Lexington AND KY to get everything that is NOT (Lexington AND KY) — From what I’ve seen, I don’t think GA’s regular expressions are sophisticated enough to do that.

BTW, just to experiment I tried making a two step filter using my city filter and my state filter. But the filters didn’t screen anything in my existing data which makes me wonder… a new filter will work on one’s current existing data — it doesn’t just apply to new data from that point on, right ?? . . .and if so, one’s existing universe of data is not permanently affected by a new filter one might add — If one deletes that filter, all the old data is still there ??

steve says:

Just to restate the problem:
Want ONLY traffic from Lexington, KY; but not from any other State. Silly eg Lexington, ZZ.
The interface allows us to choose one or the other in a single filter, but not both.
Thusly, we need to construct multiple filters that achieve the same result.
BUT!!! The filters must process on through – we can’t match and exit, we must filter and keep processing.

This latter point is why we MUST use EXCLUDE filters – vs include. Include will match and exit – ie Won’t work.

Still with me? :-)
Hence the horrible quintuple negative in my earlier suggestion. ;-)

Order doesn’t really matter, but I’d suggest going for most exclusive filter first – that tends to be “better practice” / “more efficient”.
Filter 1: Exclude all Traffic NOT from Lexington
Filter 2: Exclude all Traffic NOT from KY

So this ends up as a full logical expression like so:
(Exclude NOT Lexington) AND (Exclude NOT KY)

How? Still chasing. But from here it’s just implementation details. ;-)

colorado_gumi:
1. GA’s RegEx’s are very sophisticated. They implement a full POSIX suite near as I can tell. Their docco certainly states as such. One of Robbin’s other regulars, Alan, discovered that they implement Look Aheads, both Positive and Negative. Which is a pretty darn sophisticated RegEx to have! In 20 something years of doing RegEx’s I’ve only ever had to use lookaheads a handful of times, and only in the past year.
Incidentally, using a Neg LookAhead may be the way to solve this one, as I think some more…

2. Yes. My understanding is that GA only filters from that point on. Will not affect old data.

HTH?
Cheers!
– Steve

steve says:

20+ years? I exaggerate. Try ~18 years.
’88 or ’89 was when we were first taught them @ uni. More likely early ’89.

Cheers!
– Steve

Robbin says:

Ah yes. You are right as usual, on both points one and of course, one point two. Now are you going to write the RegExen for this poor guy or not?

steve says:

“Now are you going to write the RegExen for this poor guy or not?”

BWHAHAHAhahahahahahahaha! :-)

In between:
* assisting with the re-delegating of ~ 80 domains for our various web sites
* fixing a user’s laptop KVM switching woes
* Applying and then later reverting a Prod change that failed part of the backend site. Including full change control and Source Code Control.
** Including documenting how to do same for the Dev’s to know for future reference, as well as pushing into our wiki.
* Discussing our various futures post the weekend’s Federal Election and Change of Government.
* Backup Tape changes
* Meeting with our Director, re status
* Planning Primary backend server upgrade, staging trial-run upgrade via purpose built VM,

But apart from all that? Yeah just been sitting around all morning doing zippo.
Apologies for the delay. ;-)

Seriously tho. The Exclude filters required should be something like this:
Filter 1: Exclude all Traffic NOT from Lexington:
(?!Lexington)

Filter 2: Exclude all Traffic NOT from KY
(?!KY)

I’d advise creating a test profile and testing in that first. I’ve not tested the above in GA itself.
Robbin, I’m pretty sure you’ve got some advice here somewhere on how to do that?

And I do sincerely apologise: Robbin you did the original article on lookaheads. Alan expanded further. I quite clearly badly mis-recalled.
http://www.lunametrics.com/blog/2007/08/08/regular-expressions-for-ga-bonus-iii-lookahead/

Correction#2. Re RegEx engine for GA: It’s not POSIX. It’s PCRE isn’t it? Similar, but PCRE is more than.

I did chuckle. Here I sit with a worn copy of Mastering Regular Expressions beside me, and I go hunting for an older blog posting of yours instead. :-D

Cheers!
– Steve

colorado_gumi says:

“steve Says:
November 25th, 2007 at 4:56 pm

Just to restate the problem:
Want ONLY traffic from Lexington, KY; but not from any other State. Silly eg Lexington, ZZ.”

The report I want to create includes every visitor that is NOT (Lexington AND KY), to use the example. It would ONLY have traffic NOT from Lexington, KY — Lexington, KY visitors would be excluded.

steve says:

Sorry for delay, have needed to some checking of the logic and that include filters mean what I thought they did.
We can also see why restating the problem is a Good Idea(tm)! I had the problem set totally reversed. :-(

In essence, I *strongly believe* (ie test!!!) you’d need to use “negative” INCLUDE filters. As per Robbins initial correction to me.

http://www.google.com/support/googleanalytics/bin/answer.py?answer=55559&topic=11094
“When an Include Filter is applied, the hit is thrown away if the pattern does not match the data. If multiple Include Filters are applied, the hit must match every applied Include Filter in order for the hit to be saved.”

So , in techno-babble english, we want two Include filters:
1. Match everything BUT ‘Lexington’
2. Match everything BUT ‘KY’

Which if I’ve got the logic right, simple use the above RegEx’s in my previous, in include filters.
I’d suggest this is more logic puzzle, than GA/RegEx puzzle. :-)

Robbin, do you concur? I want a cross check on this one. :-)

Do let us know how it goes!

Cheers!
– Steve

colorado_gumi says:

If a hit must match both include filters for it to be saved…

Concord, MA passes 1 and 2, saved. . .Good
Lexington, KY fails 1 and 2, rejected. . .Good
Lexington, MA fails 1 and passes 2, but not both, so it is rejected ???. . .Bad
Louisville, KY fails 2 and passes 1, but not both, so it is rejected ???. . .Bad

Wouldn’t this Include filter pair incorrectly reject everything that is Lexington and incorrectly reject everything that is KY as well as correctly reject all that are Lexington, KY ?

steve says:

Argh. Yes you’re spot on.

Hmm. And my alternate can’t work either – was thinking of using the advanced tab to create a new custom/user defined field that you could exclude on. Two part it and add “Lexington KY” type of thing. Ugly but….

But you can only get Geo, you can’t get the City and/or State information. So can’t work. :-(

http://www.google.com/support/googleanalytics/bin/answer.py?answer=55588
(found via Justin’s book!)

I’m all out of ideas. It looks like the ability is there, but the interface won’t let you construct the rules needed.

Cheers!
– Steve

colorado_gumi says:

The technique on this page/post would work, but the variables for visitor city and region aren’t among those offered in the drop down menu, there doesn’t seem to be anyway to finagle them in the list and I don’t know the variable names anyways.

Robbin says:

Well Colorado, don’t despair. Here is how you do it:

1) Go back to the original post and do it the way I suggested, but use Internet Explorer
2) If you love FF, you can do it there too, but don’t use the dropdown. Instead, put your cursor into the field and use your arrow buttons to page through the fields, and you will find city and region and country.

It is just a little bug related to the dropdown in FF and will probably get fixed soon. (But we learned a lot about Regular Expressions and filters, no?) And of course, thank you, Steve, for all your help!

colorado_gumi says:

Gee whiz, there the missing variables are in IE. They were there all along. What a swell adventure. Thanks you guys.

steve says:

Ha! Works in IE, not in Firefox. Sigh.
I think I want to go home and cry now… :-D

Cheers! And Thanks Robbin for not “Steve you dill, do this…” :-)

- Dill

colorado_gumi says:

Just a Sec…

I’m sorry to say the adventure continues in this “Missing Variables” saga. The first Episode was ‘A New Hope.’ This episode is ‘The Empire Strikes Back.’

In the Combined Field Filter I need, I can select the “missing” visitor city and visitor region variables I want either in Internet Explorer (drop-down menu) or Firefox (up/down arrows) and apparently create and save the needed filter.

But the selected variables don’t “stick.”

If i go back and edit the filter after saving it, the boxes where I selected the variables are blank (actually each has a “dash” rather than the desired variable name). I discovered this because the filters I thought I’d created and activated weren’t working.

If I go in Firefox and select two of the variables to SHOW in it’s drop down menu, those variables STICK. . .As they do with Internet Explorer.

Robbin says:

This is probably just part of the bug. I will submit it. Thanks – Robbin