Upcoming LunaMetrics Seminars
Boston, Oct 6-10 Chicago, Oct 20-24 Seattle, Nov 3-7 New York City, Nov 17-21

How to Solve Google Analytics Sampling: 8 Ways to Get More Data

What do Bald Eagles have to do with Data sampling? Nothing. they're just awesome.

What do Bald Eagles have to do with Data sampling? Nothing.

What is Sampling?

Sampling is a tried and true statistical technique. You see it every time you hear about a political poll, or anything like that. 68% of people prefer dogs to cats. 28% of Americans think that God created Doritos. They didn’t actually ASK everyone. They asked a small subset of people, that hopefully is large enough to make what they’re reporting on accurate. So if they’re saying “All Americans” they might interview 1,000 people out of the 300 million that live here, and pretend it’s a legitimate sample.

It also leads to all the political polls being different, and arguing about whose poll is better. This generally refers to what sort of sample are they taking. How many people are they asking, what’s their makeup, how many men, versus women, etc.

The basic idea is that you simply can’t ask 100% of people certain questions, so you ask a subset aka a sample, and that then represents the larger group.

How does it work in Google Analytics?


The default sample is 250,000 visits (not pageviews… visits. It’s session based), but you can normally adjust a slider on the page (located in the top right… it looks like a checkerbox grid) and have the sample be more precise (but take longer to process) and have it cover 500,000 visits. Whenever you’re looking in a Standard Report you’re going to see unsampled data. So if you were to go look at All your Traffic Sources, you’ll see the correct numbers. However if you are looking at a set of visits over 250k (or 500k if you up your limit) and you want to create a custom report, filter the report in any way, add a segment, etc. Then you’re going to be looking at a sample.

How do I know if What I’m Seeing is Sampled?

google analytics sampling

Keep your eyes peeled for the yellow sampling box. It looks like a yellow box in the top right of your Google Analytics interface. It lets you know how may visits are part of the sample, and what percent that is. Keep in mind that this is the percent of the PROPERTY, not the Profile you’re in. It’s a sample from ALL the visits that come into that web property, regardless of what profile you are in. So even if the profile you are in shows only 50,000 visits over that time span you’re looking at, if the property as a whole has 20 websites in it, and each one tracks 50k in traffic, the property itself will have well over a million visits, and while you might be looking at a profile filtered to just show that single website, applying a segment to that 50,000 visits will require sampling to kick in at best at 50% if you have it on the highest precision.

Why is Sampling a Problem?

Well, it’s not always. If you’re in one profile, and it accounts for 99+% of the total traffic to the web property, and you apply a segment and you’re looking at an 80-90% sample, then it can be generally pretty accurate. You have to be aware that they’re not precise numbers, so if you need to pay authors based on pageviews of their articles, you definitely don’t want to do it based on a sampled report. However if you’re just looking at keywords or general trends an 80%+ sample rate is very usable.

At what point can you be comfortable? Well when you see particularly small numbers like under 1% you can be pretty sure that it’s junk data. Once the sample reaches a certain point, it’ll become obvious. You’ll see the same number repeated over and over and over. 330 visits on this campaign, 330 visits on that campaign, etc. But even before that point, the data can be very inaccurate.

data sampling gone wrong

With data under 10% we’ve seen huge swings. Once in a sampled report a premium client’s third party vendor was worried about the revenue from paid ads because they had generated “only” $900,000 in revenue on the site. When we looked at the unsampled data (rather than a less than 1% sample) however there had actually been instead 1.6 million in revenue from the ads, an 80% difference.

Even at higher sample rates we’ve noticed issues. Once we detected an upwards of a 10% overall change at a near 50% sample. A client site when compared month to month, year over year, showed a 5% increase, with a 48% sample. However, when we looked at the data unsampled, we were able to show that instead of improving by 5% it had actually decreased by 5%. That’s an issue.

Even though the math doesn’t line up, I once described the Google Analytics sample rate as akin to my personal confidence in the data. At 90% sample, I’m 90% confident that the data is close to correct. At 50% I’m 50% confident. At 1% sample, I’m 1% confident. I don’t generally base opinions on things when I’m as close to a coin flip in regards to my confidence.

What are My Alternatives?

1. Google Analytics Premium

Well, if you’ve got so much data that you’re sampling all the time, you might want to consider Google Analytics Premium. Premium brings a number of benefits, including the ability to export unsampled reports with up to 3 million rows of data (you can actually have more than 3 million rows, it’ll just aggregate the extra), and up to 100 million visits. If you’re constantly relying on sampled reports under 50% for your reporting, then you really might want to consider better data accuracy. Premium brings a number of other benefits, but when you’ve got so much data that you can’t do anything with your data on a quarterly, monthly, and/or especially daily basis, then you really want to consider it. (Disclosure: We’re a Google Analytics Premium Reseller)

Possible within GA Interface Right Now

2. Change Your Date Range

What date range are you looking at? If you’re trying to look at a 3 year period, then you might want to consider changing that to a smaller time span, to get the total visits to the property under that 500k level. If you are under 500k visits per month, then you could look month to month, and aggregate the data yourself in a spreadsheet, rather than use the Google Analytics interface.

3. Use Standard Reports

The standard reports in Google Analytics are never sampled. You’ll know this also because when you’re on a standard report you won’t see the yellow boxed sample message. Sometimes you might be applying a segment, or attempting to use a custom report, when you can get the same information from a standard report.

4. Create New Profiles with different filters

Sometimes you want to dig a little deeper in your reports than the standard reports allow. Maybe you want to look at the content reports for just your organic visitors, and applying the organic medium advanced segment forces you into a sample. You can create a new profile to just capture your organic traffic, and then put a filter on that profile to only allow organic traffic. If you apply any segments in the profile it’ll get sampled, but the standard reports will hold the unsampled information for just that organic traffic.

Requires Some Programming

5. Limit the # of sites tracking into the same Web Property

Another thing you can do is reduce the amount of traffic into that property. It is very common to aggregate all your tracking into a single account and property, and then look at the different websites by creating filters on various profiles, based on the hostname/domain. If you’re over the traffic level that generates sampled reports, you could consider breaking those websites out into different properties and tracking them separately. 20 websites with 30k visits a month will create sampled data on the monthly report, but if you were to break those 20 websites out into their own properties, it won’t even sample it when you look at a full year.

6. Set Sampling yourself to record less visits

You can set your own sample rate on your website via the _setSampleRate() method.

The thing to keep in mind here is that this samples what traffic gets sent into your Google Analytics account. so if you set your sample rate at 80, you’re going to be, by default, looking at sampled data in Google Analytics, even when it doesn’t even say it’s being sampled. It will let you use the GA interface without sampling happening on the server side, and instead be set automatically when visitors come to your site. I’m not a huge fan of this because it obfuscates that sample rate from the user of Google Analytics, but I’ll mention it here in fairness.

7. Server Side Solution with Alternative Tracking

Another interesting option is to have a second tracker used only with a certain subset of your visitors, delivered dynamically. You can create a second web property entirely, and then send different visitors different tracking code depending on who they are. Maybe you are a university, and you want to track your prospective students differently than your current students. As long as you’re able to identify these people, you can use a cookie to record which tracking code they should be sent. Your main tracker still gets sent out on every single page, but you only fire the second tracker on those specific users. Separating out your cohorts of visitors to different trackers will get you more accurate data with that second tracker because you’ll have smaller numbers. Maybe you have a million visitors a month, but your premium members which you recognize via a cookie you’ve set on their computer, only number about 20,000 per month. By sending that secondary tracker ONLY to your premium members, you now have a second web property that contains JUST those members’ visits, which you can analyze without sampling even over a whole year.

8. GA API

Another option is to hook into the Google Analytics API itself, and pull your data out to your own spreadsheets. You can make up to 50,000 requests per day to the API with 10,000 rows per request. Depending on the amount of data you have, this might require you to perform a number of different calls to Google Analytics via the API to capture all the data you need. In theory if you had a heavily sampled site, you could pull your data for every day of the month into a spreadsheet by making a different call (or more) for every day, depending on your data levels. This would give you the unsampled data in your own reports that you could use outside of the Google Analytics UI.

9. Analytics Canvas

http://www.analyticscanvas.com/

I’ve never personally used Analytics Canvas, but I’ve heard good things. Much of what they do involves using the Google Analytics API for reporting, and they can help companies looking to specifically use the GA API above, but where they pull their hair out for you, rather than you going bald yourself.

Requires an Invitation (for now)

10. BigQuery

Last but not least is Google BigQuery. It’s the new hotness.

https://developers.google.com/bigquery/

Do you want to analyze terabytes of data with the click of a button? Interactively analyze BILLIONS of rows of data? Sign up for the beta. It’s not open to the public yet, but if you’re interested in crunching some really big numbers, BigQuery is going to be an option that you might want to consider in the near future.

Wait…This blog post title says 8 ways to get more from your data, and you just listed 10

It’s a sampled title.

I still don’t understand why you had a picture of a bald eagle

Life is a mystery sometimes, isn’t it?

Sayf Sharif

About Sayf Sharif

Sayf Sharif is a Web Analyst, and expert in Usability and UX, who has worked with businesses large and small to maximize their online presence since the beginning of the Web, winning numerous awards along the way. Sayf has studied human tool use from the stone age (he went to graduate school for Archaeology) to the information age (he started programing on his father’s TRS-80), and is always interested in what goals people wish to accomplish using their tools, and how successful that experience was.

http://www.lunametrics.com/blog/2013/06/24/solutions-google-analytics-sampling-problems-8-ways-data/

27 Responses to “How to Solve Google Analytics Sampling: 8 Ways to Get More Data”

eywu says:

Great article! GA sampling is usually a pretty hard topic to explain and get your head around.

Before we went to GA Premium we used technique #4 a lot simply to get ad hoc unsampled views. It’s a really easy to setup.

I’ll add a +1 for Analytics Canvas, which is what we used to get to “near” unsampled data, and it was nice because you can easily aggregate across multiple properties and/or accounts. But the best thing about Analytics Canvas is it acts as an ETL of sort. You can perform filters, aggregations, and other data munging operations across accounts and then have it build you a simple Excel Spreadsheet with multiple tabs if needed.

The documentation is a little lacking, but after a couple of days with playing with it … it’s become a pretty valuable time saving tool.

Tim Wilson says:

Great post, Sayf! This is the most comprehensive explanation and list of workarounds I’ve seen on GA sampling.

One note that I think is missing is that sampling tends to cause more of an issue with low % metrics. For instance, conversion rate is more likely to swing wildly when sampling is in play — even if you are north of 50% with the sampling. I think this is because you’re taking two variables (in this case, orders, which are a relatively small number compared to visits, and visits) and combine the noisiness in both of them — it magnifies the variation from sampling.

I’ve been meaning to come up with a way to visually depict why that is. Luckily, this post covers every other aspect of sampling imaginable!

So you could say that if GA uses less then 25% of real data to calculate the metrics, and considered the fact that most people look back at the last 30 days: if you have more than (4 * 500.000) = 2.000.000 visits per month, you shouldn’t use GA anymore.

Lauren Linn says:

Thank you for this thoughtful explanation. I have a hard time verbalizing this to clients.

Mike Belasco says:

Hi Sayf,
Thanks for the tips!

Quick question, the if the ‘precision’ selection widget is present, but there is a no yellow box, is the data being sampled? Here is a screen shot…

https://www.evernote.com/shard/s61/sh/10c438d0-379d-4f64-90fa-70804bce80c5/28def9edc7677c7a9dd82ee785611276/deep/0/Screenshot%206/25/13%2010:06%20PM.png

Thanks Mike!

Sayf Sharif Sayf Sharif says:

Eywu, #4 is also a personal favorite of mine with even Premium clients. Great way to get an unsampled look at a regular view.

Sayf Sharif Sayf Sharif says:

Tim, Thanks! We agree about what your’e saying. We were actually trying to crunch some numbers on it ahead of this post, and couldn’t come up with anything hard that I could report on, but I agree. When you’re dealing with smaller variables like a conversion, or even some sort of smaller visit percentage segment, the accuracy seems MUCH lower than the sample rate would indicate. Not sure if the math backs that up or not though, but it sure seems that way.

Sayf Sharif Sayf Sharif says:

Andre, if you are at 2 million visits a month you absolutely can use GA. You just need to follow some of the steps above such as #4 to make it easier to see different cohorts of data without applying advanced segments, or filtering within the data. If every time you’re looking at your data though you’re applying segments and filters that force you to only be seeing less than 25% of a sample, then either you can refine your responses to sampling a bit more, or you can consider upgrading to GA Premium for more unsampled reporting across a wider time range. With 2 million visits a month I would hope that the GA Premium price tag would be within reach, and it’s still less than Omniture/SiteCatalyst.

Sayf Sharif Sayf Sharif says:

Mike, That’s interesting. In general that should not be there unless there is sampling, but you’re not getting a sampling message. That could be a bug, or something going on with the page cache. Do you see that regularly? If you’re seeing it on a standard report, with no segments and filters applied, then your’e definitely not being sampled. If you’re only seeing it when you’re using filters and segments then I might be concerned that it’s a bug, and for whatever reason your browser isn’t showing you the sampling numbers. Insert default “Try clearing your cache, and restarting your browser, and logging out and back into analytics” line here.

Nathan says:

Sayf,

I like your style. Thanks for the helpful post.

Nathan

Henrik says:

Well I really don’t understand your concerns about GA using samples and all the efforts you are putting into some good workarounds…there still just are…estimated guesses. Tools as Adobe Analytics and Webtrends can be aquired for less than $ 100 pr. day, why not just not go for a professional tool there will save your day and give your correct data? Next I can understand if you are in a small business and the conversion online or whatever business value your digital communication brings you are very small, it can be a good thing to use GA. But bying GA Premium for $150.000? and still have issues with sampled data? I really dont get it.

Henrik says:

I am not sure I got your point correct regarding “How do I know if What I’m Seeing is Sampled?” You mention property with 20 webpages ect. ect but Google themselfs define a Property as it very easy can be a webdomain. I think you mix what Google Analytic account are and what a Google Account Propery is?
As far as I understand it, Yes if you have a property it have a limit of 250k/500k visit for doing samples if it is not a standard report, and the workaround is not just to create a segmented/filtered profile, because it will still be beloing to the same property there are over 250k/500K limit for sampling.?

Sayf Sharif Sayf Sharif says:

Henrik,

Certainly Adobe Analytics and Webtrends are valid alternatives. Many sites only see sampling on very wide date ranges given the 500,000 hit sample rate. At that point it remains accurate and free, compared to the alternatives. Depending on the traffic level above that you could be paying significantly more for other services, including your own servers, etc. If you are getting a million hits a day you are not paying $100 a day when all is said and done with the other systems.

Recently at the GA Summit Google also announced that sampling would be getting better soon moving that half a million hits mark of sampling upwards of 10 or even 25 million hits within the interface. That will significantly change things.

per your other question, the web property is basically the UA-XXX-X number which you could theoretically put on many different websites. Many companies do this for cross domain tracking. When the hits come into GA they are sampled at that property level, not at the profile level. So if you had the web property ID on 2 sites, and 1 site had 10% of the traffic of the other site, then it’s key to understand that the sample rate assumes that there is an even amount of traffic, and will affect the accuracy of the sample in those cases as it samples everything, and then only delivers what the profile can see afterwards. I hope that makes sense.

Abdo says:

The best article I have read regarding to GA Sampling. Thanks for this Saif!

Sayf Sharif Sayf Sharif says:

You’re welcome Abdo!

Eric says:

Very helpful overview, thanks so much for this. Solid advice I can forward to folks who ask about their sampled data.
My personal favorite: “Wait…This blog post title says 8 ways to get more from your data, and you just listed 10. Answer: It’s a sampled title.”
Brilliant.

cemana says:

Great it!. Do you know if GA is using sampling for real time reporting?

Sayf Sharif Sayf Sharif says:

Real time reports are not sampled to my knowledge.

Shuki Mann says:

Hi Sayf,

I didn’t understand something – you said that the 250,000 limitation is for one PROPERTY, but then in the 4th tip you told that separate filtered profiles can help.
I dont understand how separate profiles will solve the problem?
Even if i have less than 250K in one (filtered) profile and another 500K in the other profiles – how much visits will i see in the filtered one?

Sayf Sharif Sayf Sharif says:

Shuki, standard reports are never sampled, so if you look at a standard report in any view you won’t get sampling unless you add a secondary dimension, or a segment, etc.

So if you’re always looking at that segment, it’ll be sampled, but if instead of using a segment, you make that same definition of the custom segment, a new view, it will show you the same thing but unsampled in the standard reports.

If I make a custom segment for traffic from Canada, on a property with say 5,000,000 visits, it’s going to be heavily sampled. If I make a view which only shows traffic from Canada, those standard reports will NOT be sampled, but show just Canadian traffic.

Attila Nemeth says:

I Say,
Thanks for the tips!

Two questions:
- I use three profile to track a site (Master, Test, Raw data). Will the precision increase if I delete Raw data or Test profile?
- A bit out of topic but do you know how the 10 million hits limit is calculated, by profile or by property? So the question is same, could I increase my precision by deleting Raw data or Test profile?

Sayf Sharif Sayf Sharif says:

No, the number of profile/views in a property has no affect on sampling.

The hits are calculated by by property, not by profile/view.

Ben says:

Do you know if Events count as Sessions for the calculation of Sampling? I can’t seem to find a straight answer in the Google help files. I have some fairly high frequency events that are set as non-interactive, but I’m afraid they’re causing my property to get sampled prematurely.

I’m considering the pros and cons of moving all my events into another property. Thoughts?

Sayf Sharif Sayf Sharif says:

yes events count as sessions. The sample is based on sessions, so you can absolutely have a session that is nothing but events.

You could have 10 million events and they wouldn’t be sampled, as long as your session count was under the sampling limit.

Hi Sayf,

a very good post as i often was wondering, how this is exactly calculated. So my understanding is right, if you use the google analytics api i.e. via a google spreadsheet you get unfiltered data, even though you might use for report generation a filter i.e ga:medium==organic?
Thanks very much for your good article
Alexander from Germany

Sayf Sharif Sayf Sharif says:

Alexander,

It will be sampled if that same report would be sampled in the interface:

https://developers.google.com/analytics/devguides/reporting/core/v3/reference#sampling

Hi Sayf,

I’d love you guys at LunaMetrics to test out a new tool we created to “unsample” data. It isn’t perfect but works faster (in some cases) than using the feature in premium and is completely free.

Essentially all it does is merge excels together and some magical calculation then gives you a unsampled csv to download.

beamusup.com/unsample-google-analytics-reports-easy-tutorial/

Let me know what you think.