Google Analytics & BigQuery: The Whys and Hows
Back in May, Google announced that GA Premium customers would be able to export analytics data to BigQuery. It’s now rolling out to all Premium customers. What does this really mean? What’s it let you do beyond what you could before?
How do you access the data?
BigQuery stores your GA data in what is basically a giant table. It gives you a SQL-like interface to query that data, either through a web interface or programmatically.
Why SQL-“like”? Well, this isn’t exactly a relational database. Although I described it as a table, the records are actually hierarchical (each session contains many pageviews, for example). So BigQuery has some special features for dealing with those structures, but overall, if you’re familiar with SQL, you’ll find BigQuery easy to use.
As Google Analytics shifts its focus from the web to — well, the universe — with Universal Analytics, they want you to send ALL THE DATA into your Google Analytics account. But in some cases, that may not be practical or desirable. A tool like BigQuery allows you to join up data outside of Google Analytics as well, using SQL and your other databases and sources of information.
How is BigQuery different from GA data through the API?
Google Analytics has APIs to access data. These allow you to access GA data to build your own reports and so on. But the data available to you there is the same data as in your reports: it’s the processed, aggregate web data you see. In fact, the API and the Custom Reports feature in Google Analytics essentially use the same structures to access data.
BigQuery is different. Google Analytics will export raw session data to a query-able table in BigQuery. In basic terms, there’s one row in this table for each visit to your website. Each row contains a laundry list of data about the visit: the source, all the pages viewed, the visitor ID, etc.
Because this is session-level data, we can get at the kind of effects you can usually only get at with Advanced Segments in GA (including the new user segments). For example, finding all the visits by users who have viewed Product A. And using the power of BigQuery’s processing engine — voila, no sampling, no matter how big the data set you start with.
How do I get started?
First, BigQuery export is available only for Google Analytics Premium customers.
You can have the BigQuery export turned on through your Premium account manager. Note that there are costs for both data storage and processing in BigQuery, but GA Premium users get a $500/month credit to use toward those charges.
In many cases, that $500 will take you a long way. For reference, I took a look at one of our Premium customers using BigQuery. Their site has about 6M visits and 50M pageviews per month. Data has been exporting since September, and this month their storage charges will be about $12.86.
You also pay for processing. Again, for ad hoc reporting queries, this will be quite inexpensive, although if you are using automated reporting that is updated often, you could rack up charges more quickly.
Hopefully this has sparked your interest in digging into your data with BigQuery. Stay tuned: soon I’ll be posting a case study with some real-life examples of how we’ve used BigQuery in data mining Google Analytics data for customers.
About Jonathan Weber
Jonathan Weber is the Data Evangelist at LunaMetrics. He spreads the principles of analytics through our training seminars all over the East coast. The next seminar he'll be leading will be a Google Analytics training in Boston. Before he caught the analytics bug, he worked in information architecture. He holds a Master’s degree from the University of Pittsburgh School of Information Sciences. Jonathan’s breadth of knowledge – from statistics to analysis to library science – is somewhat overwhelming.