Combining Google Analytics with Other Data Sources

/

combine-ga-with-other-data-sources

Google Analytics can collect quite a lot of data on its own, from user behavior, to traffic sources, to interactions, to demographics. It can also integrate with other Google products, allowing for easy and seamless combination of data.

But sometimes you’ll have another source of data about the visitors to your website, whether it’s from your customer database, a third-party survey tool, a campaign management tool, or anything else. And naturally, you’ll want to combine that with the rich interaction data available in Google Analytics. Maybe you’ll want to build user segments in GA based on survey results, or maybe you’ll want your CRM to include a customer’s original traffic source or how often they visit the site.

Despite the breadth and variety of data sources, there is a general approach that allows you to combine your Google Analytics data with almost any other data source you may have available. Specific products may have their own best practices or gotchas, but almost all of them follow a similar pattern. Setting up a connection with your favorite third-party tool (hereafter “Tool X”) requires answering the following three questions:

  1. How is the data being combined?
  2. What is the “key” that connects the data sources?
  3. How do I put the data from one system into the other?

How Is the Data Being Combined?

Visitors interact with websites in complicated ways, and as a result, Google Analytics data is complicated. The GA interface does a good job of getting you the information you need without bogging you down in details, but when you’re dealing with data connections, you need to pay more attention to the nuts and bolts than you otherwise would. Getting this right is the most important step to making sure that your combined reports are sensible and accurate.

Do you want data from Tool X in your GA reports, or data from GA in your Tool X dashboards, or both? Sometimes one system is a “source of truth” (often a Business Intelligence tool), and data flows into it. Other times, you want to take advantage of the unique reporting and analytic capabilities of both tools. Decide which direction(s) you need to pull your data.

What Is the Scope of Your Data Connection?

Google Analytics has four scopes that data can live at: User, session, hit (page and/or event), or product. A data connection will also exist at one of these four scopes. Picking the right scope is critical to making your reports work correctly.

Marketing tools like campaign management software or email remarketing will almost always want to connect at the Session level. In Google Analytics, traffic sources and campaign data are session-scoped.

User data such as a CRM or a customer database will almost always want to connect at the User level. A/B tests are usually user-scoped as well, since the same user should be served the same test on consecutive visits. Surveys may be user-scoped or session-scoped, depending on the type of questions being asked and whether it’s specific to the user’s current visit to the site.

Data about content on your site, such as from your CMS or ad-serving platform, will almost always be hit-scoped. Most tools are not hit-scoped, either because they have their own notion of a session, or because your users don’t interact with them on every page view. For example, information about form submissions should usually connect at the session level. While the form only exists on one particular page, the goal of the data connection is usually to understand the whole series of interactions leading up to the form submission, which is session-level data.

Data about products should be product-scoped. Occasionally you may want product data to be scoped to a pageview hit, but if you have Enhanced Ecommerce it’s usually better for such data to be scoped to the product of a product detail view.

If you are pulling GA data into a business intelligence tool, you may want to combine data at several scopes, such as session-scoped traffic data and user-scoped customer lifetime value. It’s usually best to do this by setting up separate connections for each scope. GA may give surprising and inaccurate results if you attempt to combine several scopes into a single report or export.

Tool X may or may not have its own concept of scope. You will have to figure that out on your own.

What Is the “Key” That Connects the Data Sources?

In database terminology, a “key” is a value in a data store that uniquely identifies a single record. If another data store holds a reference to that key, then those two data stores can be “joined,” meaning combined at the level of individual records. For example, your social security number is a “key” that uniquely identifies you. This lets other data sources like taxes, bank records, and credit scores be uniquely associated with you as an individual, rather than some other person who might have the same name or birthdate.

The easiest and best way to combine data sources is to find a key in one data source that you can import into the other data source. A unique key helps prevent a lot of problems that arise from data not matching up exactly, or different tools using different definitions of “page” or “user.” It also gives the flexibility to adjust your connection later on. As long as the key exists in both data sets, you can always pull down more data from one tool and upload it into the other.

Above, you answered the question about what direction your data is flowing. The “key” in your data must go in that same direction. So if you are pulling data from Tool X into GA, then you need to find a key value in Tool X, and bring that value into Google Analytics.

Choosing the Right Key

There are two considerations for choosing the right key: Scope, and granularity.

It’s important to make sure your key exists at the right scope. A key may be unique at one scope but not another, or it may be ambiguous at the wrong scope. For example, campaign ID is unique at the session level but not the user level; and a product SKU is ambiguous at the session level if a user purchases more than one product.

Granularity asks: What are you tracking? If you are tracking campaigns, then you want your key to refer to an individual campaign. If you are tracking A/B tests, then you want your key to refer to a specific variation with a specific experiment. Page-level data usually refers to individual pages and product-level data to individual products, but sometimes it refers to groups or categories of these.

How Do I Put the Data from One System into the Other?

This is the part that varies the most between tools and may require some coding. Google Tag Manager is an awesome help for pulling data from one location and pushing it to another.

Pulling Data out of Google Analytics

Google Analytics stores a value called the Client ID in a first-party cookie named _ga. This is the perfect value to use as a user-scoped key because it’s the same key that Google uses in its own processing… except that it’s not available in the reports! Fortunately, it’s easy enough to pull the value of this cookie from Tag Manager and store it in a Custom Dimension.

Google Analytics does not provide a session ID value to the browser. A session can be uniquely identified by combining the Client ID with another piece of data, like Visit Number from the old utm cookies. You can also approximate sessions by combining Client ID with Date. Fortunately, very few tools have the concept of a session, so this issue tends not to show up in practice.

The key for most hit-level data is the Page Path, and the key for most product-level data is the SKU. If you are using these as keys, it’s important to be aware of any transformations you may be applying, either through GTM or through Filters in GA. For example, if you are removing certain query parameters, or applying a lower-case filter, then the URL that GA reports is not the exact same one that the visitor saw. You will need to apply the same transformation in your other tool to get the data to match up.

Once you have the key from Google Analytics, what you do with it depends on the tool. If your key is a ULR or SKU, it probably already exists in Tool X. If you are using the Client ID or something else, you will have to figure out a way to pass it along. Common solutions include adding it as a field in tag in GTM, appending it as a query parameter to a URL, or inserting it into a hidden field in a form.

Once the key is in your other tool, then you can create a Custom Report in Google Analytics based on that key, and export the data you want. Then upload it into Tool X, and follow Tool X’s instructions for how to match data.

Putting Data into Google Analytics

First, unless your key is already a built-in dimension in Google Analytics (such as product SKU), you will need to create a custom dimension. You should already know what scope to configure it to.

Second, you need to populate that key. How you do this depends on the specific tool and how it makes that key available. Common approaches are URL query parameters for campaign data or A/B tests, or cookies for most types of customer management systems. Some systems have an API that you need to interface with using custom JavaScript.

Finally, you need to use this key to integrate the rest of your data. The easiest way to do this is with Data Import. For extra style points, this process can be automated, making the data connection appear seamless after you set it up.

Related Reading

This general outline should guide how you set up connecting different platforms. If this sounds familiar, it should! We’ve outlined the specific process for several systems, and you notice a lot of crossover between posts. Here’s a quick rundown of related posts:

Conclusion

Congratulations! Now your web data lives in the same tool as other data that you’ve been using! This allows for much more powerful reports, like end-to-end tracking like tying campaign impressions to conversions that happen on your website, or connecting your traffic source data in Google Analytics will offline customer acquisition reports from your CRM.

Logan Gordon is a Consultant with seven years of experience in Web Analytics, where he has helped many companies become more data-driven organized. He drove to Pittsburgh from California and is making up for not playing in the snow as a kid. His interests include Bulgarian language (due to his wife), cooking, Bulgarian cooking, and science fiction. His only pet is Bob, a dead shark in a jar.

  • Hi, Logan.

    Can you give an example how can Visit Number be extracted from _ga cookie?

    • Cecio82

      As far as I know, it’s only available as a parameter of utma cookie for the old ga.js protocol.

      • Logan Gordon

        You are correct, I got my cookies confused. I have updated the post.

    • Logan Gordon

      Cecio82’s reply is correct, I got my cookies confused. You can approximate a session key by combining client ID with date.

      • Thanks. I thought that I miss something new about _ga cookie:)

  • Raúl Galve

    Interesting post. BigQuery works perfectly for us as a place where to carry out the combination of data sources.

Contact Us.

LunaMetrics

24 S. 18th Street, Suite 100,
Pittsburgh, PA 15203

Follow Us

1.877.220.LUNA

1.412.381.5500

getinfo@lunametrics.com

Questions?
We'll get back to you
in ONE business day.