Eliminating Bot Traffic from Google Analytics Once and For All

/

For more up to date information, check out our recent post on Bot and Spam filtering.

blog-bot-stop-traffic-small

If you’ve used Google Analytics, you’ve probably wanted to know: how much of our traffic actually comes from real human beings? In Google Analytics, it’s not always clear. It is reported that bot traffic now accounts for 56% of all traffic on a typical website. We would hope that most of this traffic is eliminated from Google Analytics. Is it?

It used to be simple: Bots would not process JavaScript. Google Analytics uses JavaScript. Therefore, bots would not show up in Google Analytics.

bot-keyboardHowever, with the proliferation of jQuery, single-page web applications and dynamic ajax calls, smart bots have taken over. Now a bot can process JavaScript (and potentially Google Analytics), in a similar way to a real human’s web browser. And it’s a good thing too: if today’s search engine crawlers could not process JavaScript, much of the human-readable web would be hidden from search engines.

But, there are also evil smart bots: bots that go crawling for content they can scrape and use for their own nefarious gain. Some bots crawl web pages just to wreak havoc on web servers and increase costs for site owners.

Good search engine bots, generally speaking, will be excluded from Google Analytics automatically. They also follow directives that are outlined in a website’s robots.txt file, or in its meta tags, and crawl only the pages they’re supposed to. Good bots intentionally prevent requests from being sent to Google Analytics’ servers.

It’s the evil ones, the ones that break all the rules, and the ones that process JavaScript that we’re most concerned about. These “bad bots” account for a staggering 27% of all web traffic, at least, according to the same Incapsula study.

We need a solution to filter out bad bots from Google Analytics, so that we can confidently report on our data and know that the traffic totals, behavior statistics and conversion percentages we see are really from human beings.

What Can Be Done about it?

The team here at LunaMetrics has developed the ultimate step-by-step process to totally eliminate bots from Google Analytics, once and for all. Despite all the articles over the years surrounding this topic, we believe that no method has ever so thoroughly and systematically extinguished bot traffic.

In fact, we guarantee that not a single bot will be recorded inside your Google Analytics reports, if you follow each of these steps.

Step 1 – Check the Box in the Admin View Settings

admin-bot-filteringThere is an option (since July 2014) inside the Admin, under View Settings, to remove known bots from Google Analytics. Sayf Sharif wrote about this in the past. DO NOT check this box on your main reporting View. He recommends creating a Test View first, so that you can see what impact this option will have on your data collection moving forward.

Google Analytics matches each visitors’ User Agent string to the list of known bots and spiders on the IAB/ABC International Spiders & Bots List. This is a paid list, so we’re not entirely sure which bots are included. (However, a Google search reveals this IAB list from 2013.)

Step 2 – Eliminate Bots by IP Address

While you can’t see IP addresses in Google Analytics reports, you can block IPs using Google Analytics View Filters. If you’re feeling adventurous, or if there is a pesky IP that is eating up your monthly quota of Google Analytics hits (10 million in the free version, 1 billion+ for GA Premium), you can follow Jon Meck’s instructions and block by IP address in Google Tag Manager, so those users aren’t even served Google Analytics code.

As a side-note, you may want to totally block bots from even visiting your website… especially if a particularly nasty bot keeps spamming your website or is engaging in a DDoS. To do so in an Apache environment, you can edit your website’s .htaccess file to block IP addresses from loading web resources, using this tool and perhaps a little regular expression trickery.

Step 3 – Eliminate Bots by User Agent

custom-dimension-user-agentOf course, bots can switch between several IP addresses, making identifying them especially difficult. Enter Custom Dimensions. It is possible to pass each of your visitors’ User Agent strings into Google Analytics as a custom dimension, using Google Tag Manager. (You are using Google Tag Manager, aren’t you?) Then you can exclude all sessions based on User Agents that you know to be bots.

Begin by creating a Custom Dimension in Google Analytics, in the Admin. The Custom Dimension will be called “User Agent”, and we’ll set it at the Session scope. (Take note of the “Index” so we can refer to it in a moment.)

Next, in Google Tag Manager, we’ll create a new Variable. Dan Wilkerson explains how easy it is to retrieve the User Agent in Google Tag Manager. Simply create a JavaScript Variable with the value navigator.userAgent. See below:

user-agent-variable

The final step is to populate your Google Analytics Pageview tag with the custom variable slot, using the “Index” from earlier. Enter the the Variable {{User Agent}} for the Value.

tag-custom-dimension-user-agent

Now you’ll want to wait a day or two for your User Agents to begin to enter Google Analytics, so that you can identify which User Agents in particular are behaving like bots. Take note of users with repeat bounce rates, or users with hundreds of repeat visits in a single day, for example. Over time you can begin to write Google Analytics Filters to eliminate those User Agents, using the settings under Admin -> View -> Filters:

bot-filter

Step 4 – Require Completion of a CAPTCHA

google-recaptchaSometimes filtering by User Agent just isn’t specific enough, because some bots mask their true identity, or spoof themselves as regular users. We’ll need another solution.

In December, Google introduced a new version of their popular reCAPTCHA service, nicknamed “No CAPTCHA”. (reCAPTCHA was actually developed here in Pittsburgh!) This new version of reCAPCTHA is able to detect subtle cues based on typical human behavior — mouse usage, in particular — and in most cases eliminates the need for your human users to have to type in a CAPTCHA phrase at all!

When your visitors come to your website for the first time, you will display this reCAPTCHA — just follow the simple instructions in the documentation. In this case, you will NOT fire Google Analytics until the reCAPTHCA is completed successfully.

You’ll require each new user to complete the reCAPTCHA, and then your application will set a session cookie upon a successful completion. We are confident that the vast majority of your bot traffic will be eliminated from Google Analytics using this method. This is a good start, but for full protection, continue reading.

Step 5 – Confirm Your Users’ Email

email
After the successful CAPTCHA completion, your website must now present to your users a form, asking them to fill in their email address. The email must be valid. Upon entering an email, your website should display a message that reads, Thank you for entering your email. Please check your inbox for our confirmation email, and follow the instructions in that message. (You may change the messaging as you see fit.)

Your users will then need to check their email, and click on the confirmation link to access the website. This may take up to 24 hours, and some users’ spam filters will inevitably block the activation email, but this is the necessary cost of bot-free Google Analytics data.

Warning: Some bots can auto-verify emails sent as part of a confirmation process. The most advanced bots have even begun to register their own email addresses, which then become impossible to blacklist! That’s why you’ll need to require users to demonstrate their authenticity by answering a few more questions.

Step 6A – Answer a Math Problem

math-captchaAfter the user verifies their email address, present them with a short mathematical CAPTCHA.

Step 6B – Find the Cat

recatcha-small
We call this step the reCATCHA. You will now present to your users a dialog box asking them to identify the cats in each picture. While some smart bots would be able to answer basic arithmetic, only the most sophisticated bots can identify the subtle differences between, say, cats and guinea pigs (bottom-center).

Step 7 – Magic Eye – Identify the 18th-Century Warship

One of the great challenges of eliminating bot traffic in Google Analytics is that, with each advancement in CAPTCHA technology, bot developers are expanding into once-inconceivable realms of human imitation. In order to ensure that your Google Analytics setup is built for the future, we strongly recommend that you present to your visitors a Magic Eyeยฎ, requiring them to type a description of the hidden image they see in the picture. See below:
magic-eye-sailboat-small

The answer here, of course, is “Man-of-war”. Sailboat, barque, ship of the line, galley, trireme, and schooner are incorrect responses. Only a robot would make such a mistake.

Step 8 – Enter Your Phone Number

two-step-auth-phoneAny thorough Google Analytics bot elimination strategy requires your visitors enter their phone number, to which you’ll automatically send a confirmation code for two-step authentication. (Mobile carrier rates may apply.) In addition, we recommend a random audit of these phone numbers, with rolling spot-checks where you and your team actually call as many of the numbers as you can.

In practice, we’ve learned that even the most advanced robots make computerized, emotionally devoid and soulless responses on the phone. The success of this tactic for bot elimination depends on your visitor demographic.

Step 9 – Enter Your Mailing Address

Bots that can process JavaScript, have a valid email address, can answer a series of CAPTCHAs, can interpret three-dimensional images, have a broad knowledge of 18th-century British nautical history, and possess a valid phone number would normally appear inside Google Analytics. Luckily, you will now require that your visitors enter a valid mailing address to receive a two-step authentication hardware token. (No PO boxes allowed.)
two-factor-auth

The token will update every 30 seconds, synced to a series of satellites in orbit. Your users will have to enter the code in order to proceed to the website. You can also fire a measurement protocol hit with each update from the satellite. Please allow 2-3 weeks for delivery. (You may also wish to invoice your visitors.)

Step 10 – Complete a Turing Test

turing-testAs it is theoretically possible that a bot would be able to correctly guess any two-step authentication code, a trained psychologist must now deliver a Turing Test to all remaining visitors, in order to determine if they are human. The test may be completed via Google Hangout or SurveyMonkey. Questions should include, “What did you come to this website to do?” and, “How disappointed would you be if you could not use this website?”

Step 11 – Find a Local Doctor

Your remaining visitors must now schedule a visit to a local physician’s office. The doctor must be a participating Google Analytics partner, in a Premium healthcare network only. Each visitor will be required to undertake a broad physical examination, to ensure that the visitor is actually a human being. A voluntary blood sample should be extracted and sent for additional laboratory testing.

Another blood sample will be sent to your company office to be recorded as a Custom Dimension, called “Blood Type”. We recommend setting this dimension at the “User” scope.

Step 12 – “Why I Am a Human” Essay

humanity-small
Very few bots will remain at this stage. However, in order to ensure that Google Analytics is completely, undeniably free of all bot traffic, you must require your remaining users to write a 10,000-word essay, entitled, “Why I Am a Human: A Retrospective.” This essay will be reviewed by a panel of dignitaries: academics, scientists, theologians, philosophers and poets, across a broad spectrum of humanity.

After weeks, or perhaps months, of careful deliberation, heated debate, and thoughtful reflection, the panel will deliver its ultimate judgement: is the visitor to your website truly human?

If so, then you should enable Google Analytics.

PLEASE NOTE: This may affect your conversion rate.

Start the Process of Verifying Your Humanity

Begin by filling out our CAPTCHA below:

recaptcha fools

You may also be interested in the following articles:

Don’t forget to check out our (real) post on Bot and Spam filtering.

Alex Moore is our Analytics Department Manager. He started building websites in the mid-90s, and has spent ten years in the agency world, focused on front and back-end development, SEO/SEM and web analytics. He also leads trainings in Google Analytics and Tag Manager around the country. Alex received his master's degree in Dublin, where he explored the Irish coast with a furry dog and lots of pints.

  • Tina Smith

    We are running an SEO company and we always have to deal with Google as well. This article provides quite helpful information in this regards. I would like to share it with my fellows at umrah and hajj to read and learn from it. Thanks for sharing with us.

  • https://www.godoctr.com/ Jignesh Parmar

    Brilliant material! Much obliged for imparting to us.

  • http://www.christinablust.com Christina Blust

    The best part of this post is that it shows up pretty high in Google searches about this topic. So you see LunaMetrics, click this one, don’t pay any attention to the date, and get much further down the post before you start to go WAIT a minute…

    • Alex Moore

      ๐Ÿ˜‰

    • tobiasdiomead

      The first thing I look at when I’m reading an article is the date it was published, mostly because when it comes to tech, anything over a year old is potentially worthless. I saw April 1 and immediately scrolled down to see if there were comments. Glad I did.

  • Pingback: Google analytics tutorials and Tips ยปCSS Author()

  • Scritty

    I have a small pin on my thumb print reader that pricks the skin and takes a blood sample. It DNA matches this to the sample kept on record (first 22,000 in the DNA sequence) to prove my ID. Happy to do it. Simple and easy.

    • Alex Moore

      This is the kind of innovative thinking we need in this industry! Please send a verified blood sample to our office, and pending approval you will be added to our non-bot list.

  • http://smekdigital.com Sean Juan

    LMAO great article…but even RSA tokens have been hacked!

  • http://www.clck.com.au Damien Elsing

    Too funny … this ranks first on Google for “google analytics exclude bot traffic”. It wasn’t until point 7 I started wondering if something funny was going on. Main thing I was looking for was the info in point 1. Great article ๐Ÿ™‚

  • John Peterson

    so important yet so easy .. was looking for this info… also if you can cover 301 redirects to block traffic from unwanted sites in details it would be a good idea

  • Vitor Pinto

    How can this be the first thing in Google search? What’s going on?

    Nice article by the way ๐Ÿ˜‰

    Let me continue my search to avoid bots now…

  • Janet Lingel Aldrich

    Man-of-War only a robot would make that mistake? LOL.

    • Xxx Xxx

      I can’t even see the blimming thing myself as I suffer from binocular and stereo vision impairment! (not a very rare condition either… so a large part of people would be classed as a robot this way).

  • Adebowale Samson

    My Google analytics data doesn’t show after I tried to use filter, please help!!

  • tedipost

    To know more about these referrer spam here are some articles

    Blogging Tips For Removing Referrer Spam Using Google Analytics

    To know about Bot programs:-

    Bot is Bad or Good Program – Blogging Tips About Bot

    To know about what is Referrer Spam:-

    Referrer Spam is not a real traffic – Blogging Tips About Referrer Spam

  • http://WWW.TEDIPOST.COM tedipost

    To know more about these referrer spam here are some articles

    Blogging Tips For Removing Referrer Spam Using Google Analytics

    To know about Bot programs:-

    Bot is Bad or Good Program – Blogging Tips About Bot

    To know about what is Referrer Spam:-

    Referrer Spam is not a real traffic – Blogging Tips About Referrer Spam

  • http://WWW.TEDIPOST.COM tedipost

    To know more about these referrer spam here are some articles

    Blogging Tips For Removing Referrer Spam Using Google Analytics

    To know about Bot programs and its types:-

    Bot is Bad or Good Program – Blogging Tips About Bot

    To know about what is Referrer Spam and faq about referrer spam:-

    Referrer Spam is not a real traffic – Blogging Tips About Referrer Spam

  • Deepak Rana

    https://blazeservr.com buy cheap rdp

  • http://www.springboardseo.com/ Springboard SEO

    If I understand correctly, I can filter by IP in GA, but if I want to filter user-agents in GA, I need to use GA Tag Manager? If that’s the case, I’m surprised.

  • http://stepin2buy.com Discounts & Coupons

    I am using google analytics for years for my deals web site http://discounts2buy.com & http://stepin2buy.com where I could see always few more hits and access in “Real Time” reporting but its really wired that no one is converted. My doubts is whether they are real humans traffic or bots. The “Source” (origin) of traffic is always set to “none”. This post really helped me to filter out bots and I set the filter right away. So waiting for its result so let me be patient for weeks.

  • Alex M

    Thanks for the interesting read, Alex.

    Bot traffic is indeed a huge problem. It’s also a billion dollar industry all striving to suck as much money from advertisers as possible. https://moz.com/blog/online-advertising-fraud
    Various researches I’ve read on the topic indicate the amount of bot traffic purchased to be anywhere up to 80%.

    And I couldn’t agree more with what you’ve written here “developers are expanding into once-inconceivable realms of human imitation”. I’ve recently bumped into a website that claimed to sell personalized bots capable of acting so human that you wouldn’t even tell that by a high bounce rate in GA. For further reading on this topic I’d suggest this – http://www.fipp.com/news/insightnews/what-are-the-nine-types-of-digital-ad-fraud

    As for me – I’ve tried eliminating bots with GA but it took a lot of my time. I had a rather small-scale project and decided to try out heat maps and video session recordings. It was quite interesting to view some of those as I couldn’t even imagine what a human would do to the device or mouse in order for it to move so erratically. But I’m not sure how you can actually scale that to a large project that would buy thousands of users daily from various sources.

    Another thing I’ve found was fingerprinting technology which is essentially an alternative to cookie tracking. I’ve signed up to http://www.fraudhunt.net to try it out. Am yet to test it on a scale

    I’d like to hear your stand on fingerprinting and the potential of this technology in combating bots. Haven’t seen a lot of discussions on this issue.

Contact Us.

LunaMetrics

24 S. 18th Street, Suite 100,
Pittsburgh, PA 15203

Follow Us

1.877.220.LUNA

1.412.381.5500

getinfo@lunametrics.com

Questions?
We'll get back to you
in ONE business day.