Google Analytics API v4: Histogram Buckets

/

Google Analytics Histogram Bucket

Back in April of last year, Google released version 4 of their reporting API. One of the new features they’ve added is the ability to request histogram buckets straight from Google, instead of binning the data yourself. Histograms allow you to examine the underlying frequency distribution of a set of data, which can help you make better decisions with your data. They’re perfect for answering questions like:

  • Do most sessions take about the same amount of time to complete, or are there distinct groups?
  • What is the relationship between session count and transactions per user?

Want to see for yourself? We’ve got a handy demo you can use to visualize some of your very own data. To get started, click ‘Connect’ below.

Try It Yourself

We’ve put together a simple demo that you can use to do a little exploring.

How It Really Works

Here’s how to use this new Histogram feature yourself with the API.

Note: we’re assuming you’ve got the technical chops to handle authorizing access to your own data and issuing the requests to the API.

Here’s what a typical query looks like with the new version of the API:

This query will return a row for each hour, with the number of users that generated a session during that hour for each row; simplified, it’d be something like this:

Wouldn’t this data be more useful if it were dayparted? Let’s use the histogram feature to bucket our data into traditional TV dayparts:

Early Morning 6:00 AM – 10:00 AM
Daytime 10:00 AM – 5:00 PM
Early Fringe 5:00 PM – 8:00 PM
Prime Time 8:00 PM – 11:00 PM
Late News 11:00 PM – 12:00 PM
Late Fringe 12:00 PM – 1:00 AM
Post Late Fringe 1:00 AM – 2:00 AM
Graveyard 2:00 AM – 6:00 AM

To request our data be returned in these new buckets, we’ll need to make two modifications to our query from before. The first change we’ll make is to add a histogramBuckets array to the ga:hour object in our dimensions array. We’ll populate this with ["0", "2", "6", "10", "17", "20", "22", "23"]. Each number in this sequence marks the beginning of a new histogram bin.

The end of the bin is inferred by the number that follows it, and if values exist below the first bin’s minimum an additional bin will be tacked on for us at the beginning to contain those values. For example, if we had started our histogramBuckets with “2” instead of “0”, the API would add a new bucket to the beginning named “<2", and it would contain the values for matching rows where the ga:hour dimension was 0 or 1. The second change we need to make is to add the “orderType”: “HISTOGRAM_BUCKET” to the orderBys portion of our request.

Here’s what the response for that query looks like for some data from a personal site:

Some Downsides

As of this writing, the chief advantage of this feature is that it can save you a little logic and time when your own application wants to use histograms with your Google Analytics data. There’s no “give me X buckets” though – you have to know the range of your data ahead of time. Additionally, data is coerced into an integer, so floats are out.

That means if you want to generate bins dynamically (like we’re doing in our example), you need to first get the range of the data from Google Analytics, then calculate those buckets and send a second request. You may wish to simply request the raw data and calculate the histogram yourself.

Hopefully Google will add some more functionality to this feature to simplify dynamic binning, too. I’d also welcome the ability to create histograms within the Google Analytics interface! Hopefully this API feature is a sign that something like that is in the works.

There are a limited set of dimensions that can be queried in this manner; here’s a complete list:

Count of Sessions ga:sessionCount
Days Since Last Session ga:daysSinceLastSession
Session Duration ga:sessionDurationBucket
Days to Transaction ga:daysToTransaction
Year ga:year
Month of the year ga:month
Week of the Year ga:week
Day of the month ga:day
Hour ga:hour
Minute ga:minute
Month Index ga:nthMonth
Week Index ga:nthWeek
Day Index ga:nthDay
Minute Index ga:nthMinute
ISO Week of the Year ga:isoWeek
ISO Year ga:isoYear
Hour Index ga:nthHour
Any Custom Dimension ga:dimensionX (where X is the Custom Dimension index)

Great Example Use Cases

Wondering how you might use this feature? Here are some more examples to get your juices flowing:

  • Use Events to capture more accurate page load times and store the time in the label, then bin the times using the API.
  • Capture blog publish dates and see when blog posts peak in engagement
  • Look at months and transactions to identify seasonality
  • Compare Session Count and Revenue to see, in general, the number of sessions required to drive your highest revenue.

Have a clever use case of your own? Let me know about it the comments.

Dan Wilkerson is a former LunaMetrician and contributor to our blog.

  • John Samaris

    At the start of the article you say it is possible to obtain “What percentage of page loads happen in under two seconds?”
    I have not managed to figure out how to do this.

    How would you suggest going about doing this since the site speed API category does not have a specific dimension for the page load time bucket.

    Basically I want to reproduce the page Behavior -> Site Speed -> Site Speed Page Timings -> Distribution but it does not seem possible.

    • Dan Wilkerson

      Ah, sorry. That’s my fault; a bad example for sure. We recommend using Events instead of Page Timings because they’re often very heavily sampled:

      https://www.lunametrics.com/blog/2016/07/28/increase-google-analytics-page-speed-hit-limit/

      I’d strongly recommend you use the above approach; timing data can be spurious, otherwise.

      As a result, I’m used to having a dimension storing those timings handy, which is what we use to define our bins. Since ‘Page Load Time’ is a metric, we can’t use histogramBins with the API like above. If you’ve got a user-level Custom Dimension (e.g. Client ID), you could hypothetically hack around it, but yes, I’d say you can’t do what you’re after w/ the Page Timing metric. I’ll remove that example.

      Dan

      • John Samaris

        Thanks for the reply. If I understand correctly to be able to use the events I would need to add code in my webpages to send data to GA on page load.

        This is indeed better because I don’t rely on GA sampling my data but involves requesting development changes.

        I am considering just periodically getting from GA the average page load time and then on my side historically track the time buckets (i.e. x number of cases where it was slower than 2 seconds, etc).
        I guess even the average page load is based on a sample but it’s better than nothing.

        Thanks

        • Dan Wilkerson

          If you haven’t already considered it, you should ask your development team to add Google Tag Manager to your site – that would enable you to implement the event tracking in that article as well as many other kinds of tracking without requesting development time.

          Dan

Contact Us.

Follow Us

1.877.220.LUNA

1.412.381.5500

getinfo@lunametrics.com

Questions?
We'll get back to you
in ONE business day.
Our Locations
THE FOUNDRY [map] LunaMetrics

24 S. 18th Street
Suite 100

Pittsburgh, PA 15203

THE STUDIO [map]

4115 N. Ravenswood
Suite 101
Chicago, IL 60613

THE LODGE [map]

2100 Manchester Rd.
Building C, Suite 1750
Wheaton, IL 60187