Eliminating Dumb Ghost Referral Traffic in Google Analytics

Eliminating Dumb Ghost Referral Traffic in Google Analytics

/

blog-exclude-ghosts

Since I wrote last August about Bot and Spider filtering there have been additional posts written about this topic such as:

http://viget.com/advance/removing-referral-spam-from-google-analytics

http://www.analyticsedge.com/2014/12/removing-referral-spam-google-analytics/

Today, I’ll be talking about the newest kid on the block, “Ghost Referral” Traffic, and how to block it from your Google Analytics.

What is Ghost Referral Traffic?

This is traffic that never actually visits your website, thanks to the Google Analytics Measurement Protocol. They’re not bots or spiders which you can generally block on your site, because they never hit your site.

They often leave a trail in your hostname or referral reports to entice you to click through to their spam-ridden websites.

Filter it out

Several others have suggested blocking this traffic by creating an include filter on Hostname (basically your own website domain) because the current Ghost Referrals are coming from other hosts. If you include ONLY your site hostname, it would prevent some other hostname’s data from showing up in your reports. Here’s the problem though… You can set the hostname in the measurement protocol.

If a spammy spammer is intentionally sending you bad traffic, they can intentionally overwrite the hostname, and suddenly your filters including only your hostname aren’t helping you. The Ghost Referral traffic becomes all but indistinguishable from your normal site traffic and those filters do nothing.

Here’s how you can really eliminate (most of) it

I’d like to suggest a more surefire way to block out traffic from outside of your website with a combination of tracking changes and filters that will work with Universal Analytics.

Step One – Set a cookie

The first thing you’re going to want to do is set a cookie on your website. You can do this manually on your site or through something like Google Tag Manager, it’s just imperative that anyone who reaches your website gets the cookie.

Let’s give it a nondescript name like “dev-status” with a value of “march2015”, and make the expiration date well into the future. Whenever someone hits any page on your site you can automatically update this cookie with “dev-status=march2015” and extend its lifespan.

Step 2 – Create a Custom Dimension

Within Google Analytics, be sure to set up the Custom Dimension at the Property level. You’ll want to set this at the User Scope just to be safe. Make sure you write down the index number.

Step 3- Grab the cookie value

We’ll be implementing this solution through Google Tag Manager, so we’ll create a new Variable/Macro for a 1st Party Cookie with the name of “dev-status”.

dev-status-cookie

Step 4 – Pass in the cookie value

We can now reference {{dev-status}} in Google Tag Manager and we’ll get the value of “march2015”. Take this Variable/Macro and place it in a Custom Dimension within your Google Analytics pageview Tag.

customdimension-devstatus

Step 5 – Filter out the bad traffic

Back in Google Analytics, create a new filter and include only traffic where your specific Custom Dimension is set to your specific value.

filter-devstatus

The Result

Now even if the Ghost Referral is spoofing your hostname (which is relatively easy to do without human intervention), it isn’t hitting your site and passing that relatively innocuous Custom Dimension value of “march2015” so it gets filtered out.

It just makes sense – if someone visits our website, they get counted in your Google Analytics. If someone doesn’t visit our site, they don’t get the cookie, so they don’t get counted.

Keep in mind – if you’re actually passing in offline hits on purpose, you can also pass in this secret key to include that traffic as well.

What about the “dumb” part of your title?

This only works if the Ghost Referral doesn’t pass that Custom Dimension value. If they come to your website, and scan what you’re sending as far as hits, and mimic the Custom Dimension value, then the hit will pass through even with the above method, because the fake hit will also have the custom dimension value.

That’s why I’m calling it “Dumb Ghost Referral” Traffic. It’s traffic, not from your site, which isn’t even looking at your site or trying too hard. “Smart Ghost Referrals” that mimic even your custom dimensions will be harder for us to detect, because they’ve been smart enough to scan your site and try and mimic other aspects than a general page hit.

We do have a method to do that as well… But that’s for another blog post.

Sayf is a former LunaMetrician and contributor to our blog.

  • http://cadastru.biz Chetraru Ioan Alexandru

    Very helpful article! I think we hub of problems with my website

  • http://www.supath.net Suren

    Thank you very much. However, I find it difficult to follow certain things. For example

    place it in a Custom Dimension within your Google Analytics pageview Tag.

    I tried finding the location where to add this.

    Thank you for helping.

  • ranyere

    In terms of implementation, why not just create a custom dimension of hit scope and deploy it in all pages with the pageview tag / tracking? Would it have the same effect?

  • Sayf Sharif

    Suren, in a pageview tag within Google Tag Manager there are “more settings” listed underneath the main tag options where you can set Custom Dimension values to pass with the hit.

  • Sayf Sharif

    ranyere, you absolutely could send it on each hit, and would essentially perform the same. I probably would err on the side of session or user just so that if they get the cookie once, they’re good, and if we miss it somewhere, or there is some technical snafu, then we don’t lose data. If they’re passing it on one hit, then we’re probably good for the dumb traffic.

  • hoyeon jung

    Thank you for helping 🙂 However, I wonder if there is any simpler way to eliminate (most of) it.

  • Sayf Sharif

    Well as we point out in our original blog that I linked to in the first sentence, there are plenty of easy ways to eliminate the bulk of these bots, including checking a checkbox.

    This is more focused on not the bulk, but the nasty stuff that remains afterwards and isn’t easy to detect or remove.

  • http://www.elitestays.co.in Kathy Rose

    thanks a lot and i think my website’s long time problem will be rectified using this method.

  • http://www.jajananfavorit.com Jajanan Favorit

    Thank you!!!

  • http://www.carlbomphotography.com/ Danne

    Good approach Sayf! Looking forward to implement this and test a few things myself. Hats off.

  • slii

    Thank you for the article. I’m trying to follow its instructions and also use Tag Manager for the cookie, but am having trouble following it to the letter.

    In “Let’s give it a nondescript name like “dev-status” with a value of “march2015″”, what does ‘value’ refer to? I.e. is it something I set in Tag Manager? The only custome field I could find is the ‘Dimension Value’ in the Universal Analytics tag’s custom dimension field.

    In the article, the above mentioned field seems to have the value {{dev-status}}, so I’m guessing I haven’t quite cracked it 🙂

    • Sayf Sharif

      Those aren’t required, they’re just what we named them. I called it dev-status but you could easily give it any name, and any value. We just need to be able to reference it.

  • Joe

    Hey Sayf, really good article. I’m interested in using this solution to combat the ever relentless referral spam, but I’ve got one question.

    I’d prefer to create the cookie manually. How could I reference this cookie as the custom dimension’s value inside of Google Analytics?
    I have never used Google Tag Manager before and I am quite new with Google Analytics, so I don’t know if the {{cookie_name}} placeholder is a Google Analytics cookie referencing feature, or something exclusive with Google Tag Manager.

    Cheers for reading,

    Joe

    • Sayf Sharif

      Joe, you can create a variable/macro in Google Tag Manager that references a 1st party cookie value. So if you create hte cookie manually, you can then access it’s value in GTM via the 1st Party Cookie variable.

  • SurferHB714

    I see 23 in real time right now. Will this take care of that as well? It freaks me out to see that. I feel like hackers are trying to drain the resources. LOL

    • Michael Dance

      I’m moving to MixPanel to avoid this stuff it’s a pain in the ass.

      • Sayf Sharif

        You’ll find these problems via every analytics platform including MixPanel.

    • Sayf Sharif

      It’s not hackers, and generally not malicious. The VAST majority can be eliminated via the methods outlined above.

      • StefsterNYC

        Yeah after more research I come to that conclusion but they are seriously annoying me. Not to mention screw up my numbers. Thanks Sayf.

        I actually found a really good way of blocking all bad bots in my htaccess. It’s on github. I gotta find the link to share. I’ve seen the bots diminish 95% all week since adding it.

        • Sayf Sharif

          Nice. I’ll pass that along to our internal team. Best to never even track this traffic rather than filter it.

          • StefsterNYC

            Exactly. We had a client, she has a million visits a month. When I came aboard I noticed they weren’t actual real hits, they were bad bots.

            Also, that blocking of the IPs at the bottom, never really works in my experience. They use proxies every time and change it up.

          • Sayf Sharif

            Yeah, I’d be hesitant to block on those IPs simply based on that page, but the user-agents I wouldn’t have a problem with at all.

          • StefsterNYC

            Totally agree.

  • http://www.develare.com/ John Peterson

    Thank you for not only explaining the problem, but gave a solution that will help keep ghost referral spammers at bay. What a pain. Look forward to the day when this is automated.

  • Ryc

    Hi Sayf, thank for your post. We have implemented your method, setting the cookie manually without the tag manager and it seems that works with most of the spammers. Now, there is one thing that I do not understand. Why we need to use a cookie? Couldnt we send the custom dimension to all visitors without need of setting a cookie first? What do we avoid with setting the cookie? Thank you!

  • Marcos

    Hi Sayf, thank for your post. We have implemented your method, setting the cookie manually without the tag manager and it seems that works with most of the spammers. Now, there is one thing that I do not understand. Why we need to use a cookie? Couldnt we send the custom dimension to all visitors without need of setting a cookie first? What do we avoid with setting the cookie? Thank you!

    • Sayf Sharif

      There’s lots of ways to do it. we wanted to make it so that it was quick and easy and wouldn’t stress your system too much or slow things down, so we leveraged the cookie to do that.

      • Laura Kainulainen

        Hi Sayf, thanks for the post. I actually used the same approach on a mobile application tracking and just passed a random string with a custom dimension (well, no possibility for cookies there) with every hit, made the custom dimension user level, and applied an include filter. So, if I just pass a random string, do you see that this approach stresses the system more than the cookie approach? Because setting that up is very easy.

  • shesselmans

    i’m afraid these hackers might have been reading your blog as well. In our case your solution did not work. Only solution that for the moment seems to help is blocking the specific source domains.

  • Veronica

    Thanks Sayf! If you’re doing this from GTM, do you still need to do a custom filter in GA? I created the variable and added it as a custom dimension in the GA pageview tracking tag, as mentioned above, and published it. What’s the next step? (this is deployed in a testing GA account in case I mess this up 🙂 )

    • Sayf Sharif

      If you do it in GTM then in theory you don’t need to also do it in GA as the traffic would never reach GA in the first place. Filter in GTM for the custom dimension and you shouldn’t be seeing any traffic in your account that doesn’t run through your GTM. If you still see bots at that point they’re the ones that are running javascript and actively hitting your website and loading GTM.

      • Veronica

        Sayf – can the filter pattern in “custom filter” in GA be anything? You have march-2015, is there any logic t that pattern? Or just something you made up?

        • Sayf Sharif

          Just something you make up.

  • Jan

    I am totally new to this. Just launched my first website and in a couple days I have over 100 unique visits according to google analytics. I have only told 1 person about the website and I’m pretty sure they haven’t told anyone. How can I tell if those 100 visitors are real and stumbling across my site somehow or if they are some kind of ghost bot?

  • http://www.develare.com/ John Peterson

    It’s been almost a month since we implemented the cookie method. It’s working beautifully. As a side effect, it looks like I’m also discovering a tremendous amount of click fraud when advertising on Google Display Network. I’m still tracking all of this down, and my claims are by no means final. It’s just a suspicion at this time. However, without the cookie method I would have never even known to look. Double thank you.

    • Sayf Sharif

      Well that’s interesting. How is it revealing click fraud? (or at least how do you suspect it’s revealing it?)

      • http://www.develare.com/ John Peterson

        When we advertise on Google Display Network. It looks like we’re seeing more fraudulent bot traffic. I suspect this is from some websites using bots to artificially click on Google Ads and get more ad revenue. Here are some numbers. During the last two weeks of May we were running mostly Google Display advertising. The unfiltered view for the Paid Traffic segment showed 213 sessions. The filtered view using the method described in this article showed 14. I’m not exactly sure why so many cookie-less sessions came through on unfiltered, however, that seems consistent with fraudulent bot traffic.

        • Sayf Sharif

          Actually that wouldn’t surprise me at all. I’ve seen similar things in the past with Facebook as well, both in liking pages, and clicking on ads, to make fake users seem legitimate and get around algorithm detection.

  • Salino

    I wonder if your method of detecting “Smart Ghost Referrals” relies on detecting a series of UI actions on a screen, which then fires a custom dimension! it’s a little more involved and needs careful planning when thinking of all possible UI actions!

    • Sayf Sharif

      It doesn’t, but setting a custom dimension on the mouse moving wouldn’t be the worst idea. Some bots might still spoof that though.

      • Salino

        haha…it’s probably a never ending battle and I can’t imagine a fail proof solution that stays valid for long time. bot makers, will find ways round it 🙂

        I can think of a number of other solutions but each with its own draw backs. I am excited to see what you came up with and discuss my solution with you.

  • Mark Hazeldine

    Hi Sayf. I’d really love to implement this, but i’m struggling with step one! How do I create a cookie using Google Tag Manager? I literally have no clue where to even start. Do I need to write some custom Javascript or something? I’m not a developer or coder, so if there are any step by step instructions you could point me to, that would be much appreciated.

    Thanks, Mark

    • Sayf Sharif

      It can be as simple as one line of javascript on a page, or in a custom html tag.

      document.cookie=”username=John Doe”;

      You can get pretty complicated with them, but they can also be as simple as a single line. Google “how to create a javascript cookie” and most of the top results are pretty good, and if you read a few of them, you should be able to muddle through it.

      • Mark Hazeldine

        Sayf, thanks. I got some help on the Google+ GTM community page and someone created this code that I could use. It’s meant to set an expiry date of 1000 days in the future. I’m hoping that this will work.

        function createCookie(name, val, days) {
        if (days) {
        var date = new Date();
        date.setTime(date.getTime() + (days * 24 * 60 * 60 * 1000));
        var expires = “; expires=” + date.toGMTString();
        } else var expires = “”;
        document.cookie = name + “=” + val + expires + “;domain=.yourdomain.com; path=/”;
        }

        createCookie(‘dev-status’,’march2015′,1000);

        • Mark Hazeldine

          I doesn’t work :(. I got someone to help me a bit he suggested changing the If statement to make the code valid and removing the . before the domain as that’s not needed any more. This is the new code I’m trying:

          function createCookie(name, val, days) {
          var expires = “”;
          if (days) {
          var date = new Date();
          date.setTime(date.getTime() + (days * 24 * 60 * 60 * 1000));
          expires = “; expires=” + date.toGMTString();
          }
          document.cookie = name + “=” + val + expires + “;domain=ezesoft.com;path=/”;
          }
          createCookie(‘dev-state’,’sept2015′,1000);

          Unfortunately even this doesn’t work. I went to view my cookies in Chrome’s advanced settings and couldn’t find one named “dev-state”, so I’m guessing the cookie is not being set properly. Should it work in Preview and Debug mode, or would I actually have to publish live?

          • Mark Hazeldine

            Ok, I seem to have answered my own question. I have published the above code live, and I’m now seeing the correct custom dimension value in GA. Hopefully others reading this will find it helpful!

          • Philip Huynh

            How did you integrate the cookie with the Google Analytics from there?

            By following step 5?

  • Mark Hazeldine

    Sayf, I have a second question. I have created my Dev Status variable, created a GA Universal Analytics pageview tag where I set the custom dimension and reference the variable for the dimension value. Do I now set this tag to fire on all pages or do I need to create a trigger for it? Do we want to attempt to grab the cookie value for every page view or just those where there is actually a value to pass through?

  • David Ulkei

    Hi Sayf,

    I have implemented it via GTM exactly as it’s in your post and have encountered a problem: because both the pageview tag and the custom HTML that sets the cookie fires on gtm.js the first pageview is always missing ( because the cookie value in that moment is still undefined in the custom dim.) and thus only the second pageview gets recorded in the filtered view. To make an event push along with setting the cookie and fireing the pageview tag on it seems a solution to me. Even sending the custom dimension with a non-interaction GA event and filter that out is considerable not to make the pageview tags fireing dependable on a later event(or it has to be in the pageview tag to make it work?) . Maybe there is an easier solution that I didnt thinked of? I really appreciate your help and the article is awesome!
    Thanks, David

    • Mark Hazeldine

      Hmm, interesting. I’d love to hear more on this too. Sayf, is David’s concern a valid one?

    • Mark Hazeldine

      Sayf, I’m really keen to know the answer to this as i’m finding a lot of valid sessions where the custom dimension has not been set and would be incorrectly filtered out. Could that be due to this issue?

      • Sayf Sharif

        The key is in creating the cookie at the right time, in the right way.

        If you have a session scope custom dimension, when that data is processed the session scope custom dimensions will be applied to every hit in that session. So if you hit a page, and the custom dimension isn’t applied, then hit another page and it gets applied… That first page WILL have that session scope value applied on processing. Same as a User scope custom dimension. It won’t go back to PREVIOUS sessions for the user, because those are already processed.

        The problem is when you don’t process the cookie on the first hit on the page, so if you get 40% bounces on your home page, but the cookie doesn’t get created until after the initial pageview is sent, then those bounces will get filtered out of your data because they never had the opportunity to get the session scoped custom dimension applied.

        I didn’t really go into a ton of detail on creating the cookie, so that’s my bad, but what I might recommend if you’re using Google Tag Manager is that you have a custom html tag on all pages that reads the value of the cookie, and if it doesn’t exist, create the cookie, then fire a datalayer push with an event in that tag, and have your pageview tags fire on that event rather than on the standard page load. It’ll delay the hit only slightly, and it will guarantee the cookie is created before the first hit on the page.

        • Mark Hazeldine

          Sayf, thanks for your reply. That sounds great, except I haven’t a clue how to actually implement your suggestion. I’m assuming I’d need to write some complex Javascript to insert in the custom HTML tag, but I’m not a coder, so I’m a bit stuck. Also, would you use an “if” statement and then more code to actually write the cookie value all within Javascript code in that first custom HTML tag or is there some way to do that “if-then-else” logic within GTM and point to my other “write cookie” tag if there was no cookie value found?
          Finally, what should be the trigger for that first tag that reads the cookie value?

        • David Ulkei

          Hi Sayf!

          Thanks for the clarification! This issue is realy about measuring proper bounce rates while excluding the ghost traffic. Fireing the pageview tag on a custom event should work too but I am using a non interaction event to send the cookie value as a custom dimension(I just dont’t feel comfortable if I delay my pageview tag or make it dependant on a different tag.) Then I exclude the traffic that has ‘undefined’ for the dimension value. Altought the dimesion is set with a tag that fires after the pageview it seems to work fine I’ve tested it back and forth in real time mode.

    • Vibhor Jain

      – I created a custom JS variable (function) that sets the cookie

      – a custom HTML Tag that calls this function

      – use GTM’s Tag Sequencing to ensure custom HTM tag fires (and cookie set!) before the first page vie tag is fired

  • Mark Hazeldine

    Ok, I have another issue with this method. I’ve added a slight tweak whereby I set my custom dimension value to “None” by default, and change it to a different value if a cookie was set. Instead of adding a filter, I’ve created two segments: One to show only “real users” and one to show only “fake users”. My results are showing quite a high number of sessions with a custom dimension value of “None” (i.e. from fake users), but I’m wondering if this method could falsely identify real users as fake, because some of the “fake” users have visit lengths of multiple minutes and have come from within my own company (I can tell from the Service Provider name)! Is this method reliable or is it possible that the cookie might not get set or the value not be picked up for real users?

    • Mark Hazeldine

      In addition to the above, yet another phenomenon i’ve noticed is that because I’ve set my custom dimension with the Scope of “User”, I’m still seeing visits come in from users that came to my site before I added the custom dimension in, and those users are coming in with a blank value. So I now have “Real Users”, “Fake Users” and “Nothing” users. How do I turn those “nothing” users into either real or fake? Maybe changing the scope to “Session” would be more appropriate?

  • Ian Harmon

    Hi Sayf,

    Thanks for putting this post together. It was recommended to me to get rid of this annoying referrer spam.

    I’m stuck at where i assign the value to the variable though. When I create the 1st party cookie in GTM there is nowhere to add a value, just the cookie name. Can you point me to where I’m going wrong please? Thanks

  • http://thespareroomproject.com Jenn | The Spare Room Project

    I’m attempting to implement this method, and I’ve run into some problems. I’ve created a cookie using this function:
    add_action(‘wp_head’, ‘add_cookie’);
    function add_cookie() { ?>

    function createCookie(name, val, days) {
    var expires = “”;
    if (days) {
    var date = new Date();
    date.setTime(date.getTime() + (days * 24 * 60 * 60 * 1000));
    expires = “; expires=” + date.toGMTString();
    }
    document.cookie = name + “=” + val + expires + “;domain=thespareroomproject.com;path=/”;
    }
    createCookie(‘dev-status’,’may2016′,1000);

    <?php }

    I've never had a problem with other wp_head hooks before. I also placed the Tag Manager container in the header through the same type of hook and function. I know it's supposed to go in the body, but WordPress is not very cooperative on that front. Now, while it seems to be catching traffic, it is also doubling and even quadrupling pageviews data. I know that when I've played around with session timeouts, instead of just registering the session duration, it also shows a second pageview. Is that what's happening here, and if so, is there any way to stop it from doing that? Thanks!

Contact Us.

LunaMetrics

24 S. 18th Street, Suite 100,
Pittsburgh, PA 15203

Follow Us

1.877.220.LUNA

1.412.381.5500

getinfo@lunametrics.com

Questions?
We'll get back to you
in ONE business day.