Extending Google Analytics with Programmatic Data Import

/

ga-programmatic-data-import
There are lots of external data sources with juicy information you might like to see modeled in Google Analytics. Maybe you’ve got cost data from non-Google ad networks you’d like to see accounted for in your reports. Maybe you have data from a CRM that you want to use to enhance your Google Analytics reporting. For these and many other cases, Data Import is your huckleberry.

My colleague Jim Gianoglio has already written a great post detailing how to use Data Import via the web interface, where you can quickly and easily upload a CSV of custom data for Google Analytics to use. But what about when you’ve got data that updates frequently? Can we automate that process?

The answer is yes; we can do programmatic Data Import with Google Analytics. Today we’ll look at two examples – Salesforce lead information and Facebook Ads Cost Data.

Step 0: Prep Work

In order to bring in our data to Google Analytics, we have to be able to join the data in Google Analytics with a shared key. If you’re not familiar with this concept, think of a coat check – you hand in your coat and get a ticket. The coat check employee puts a matching ticket on your coat and hangs it.

Later, you can exchange your ticket for the same jacket. Your ticket is the key – both the coat and the owner have the same number. This allows the coat check employee to not have to remember all the names and faces of jacket owners, and the jacket owners to not have to recite all the details about their coat when they try and claim it.

Coat Check

For our Facebook data, we’ll be using our Campaign Source and Medium as our coat check ticket – on all of our Facebook ad destination URLs, we’ll add campaign parameters like this:

http://oursite.com/ads/facebook-ad-landing-page?utm_source=facebook&utm_medium=cpc&utm_campaign=facebook-ads&utm_content=banana+smoothie+post

For our Salesforce data, we’ll be using our Google Analytics Client ID. We’ll have to have stored the Client ID in Salesforce, too. I’d recommend utilizing a hidden field on all Salesforce forms to capture Client IDs. You’ll also need to have created a Custom Dimension in Google Analytics and stored Client IDs there, too.

Editor’s Note: This post will walk through the programmatic data import process for Google Analytics, but won’t go into the specifics required to create your data sets, set up automated scripts, and handle authorization.

We’re assuming you’re technically savvy enough to gain authorized access to Google Analytics, Salesforce, and Facebook (or your data source of choice). For more on how to access your Salesforce data, check out their documentation on their web server OAuth flow. You’ll find Facebook’s docs on authorization here.

Fair warning: all of these authorization schemes are relatively tool-specific and confusing. For Salesforce, make sure your App Client has the refresh permission configured, and for Facebook, you’ll need to get the ads_read permission from a user managing those ads and trade up for a long-lived token. And for Google Analytics, you’ll want to follow the service-type authorization flow. Hopefully those tips save you some time!

One more note before we jump in – we’re going to assume that you’ve got a server somewhere that you can use to host a service that handles the orchestration of everything we’ll discuss below. If you’re looking for something on the cheap and don’t anticipate high volumes of data, I’d suggest checking out Google App Scripts. They’re a great fit for this situation; you can set up triggers and make HTTP calls to external services. Just be aware that your scripts will die if they go over the six minute execution limit.

Step 1: Creating Our Data Sets

In order to send data into Google Analytics, we’ll need to create a Data Import Data Set. We can create and configure our Data Sets from within the Data Import configuration interface, which you can find under the Property column in our Admin Tab:

Data Import Option

Inside the Data Import interface, click New Data Set.

New Data Set

Then, we’ll select the Cost Data data set type and click Next Step.

Cost Data Set Option

We’ll give our data set a name, and select which views we would like the data set to bring data into.

Select Views

Finally, we will configure the schema for our data set. Depending on what data we have available, we might use more or fewer dimensions and metrics. Then we’ll be ready to save our data set.

Once we’ve saved our data set, we’ll need to do two things:

  • Click Get Schema and copy down what we see
  • Click Get Custom Data Source ID and copy down the ID of our data set

Setting up Our Facebook Cost Data

I’m going to add the Campaign, Ad Content, and Destination URL dimensions. I’ll also add the Impressions, Cost, and Clicks metrics. Finally, I’ll select Overwrite for my cost data import behavior. This means that if I upload two sets of data with duplicate keys, Google Analytics will overwrite the values from the first set of data with the values from the second set. You may prefer the other option, Summation, which will sum up the values of duplicate keys instead. Then we’ll click Save.

Define Data Import Schema

And that’s it! Our Facebook Cost Data data set is ready to go. Make sure you copy down the schema and data source ID, as outlined above.

Setting up Our Salesforce Data Set

For our Salesforce data set, we’re going to first need to create a Custom Dimension named Salesforce ID. Which ID you use (Opportunity, Lead, Contact, etc.) is up to you – you’ll have to ask yourself how you want to model your data in Google Analytics. For our example, I’ll be using the Lead ID.

Make sure you’ve got your Client ID dimension set up too, as outlined in Step 0. If you want to include any other data from Salesforce, e.g. Department Contacted, you’ll need to create additional Custom Dimensions for that data. We’re going to keep it simple for today.

Once you’ve got that in place, head over to our Data Import interface and create a new Data Set. You might be tempted to pick User Data, but that relies on the User ID dimension as our joining key, which we may or may not have. We want to use our Client ID as our joining key, a Custom Dimension, so we’ll select Custom Data.

Next, we’ll select Client ID as our Key, which we’ll find nested under Custom Dimensions in the dropdown. Then, we’ll select Salesforce ID as our Imported Data.

At the end of the creation wizard, we have the option to select whether we’d like to overwrite dimension values if a hit contains values for the same dimensions. If you’re planning on sending in this data along with hits from another source and you have reason to believe the value might change, you might select No, and defer to the value with your hit. If you believe your automatic upload will always contain the most correct data, you can select ‘Yes’. We’ll go with Yes, because we don’t expect our Salesforce ID to ever show up on a hit and if it did, it would be better to defer to our uploaded data. You should end up with something like this:

Salesforce Data Import Schema

Save the data set, copy down the Data Source ID and schema, and we’re ready to start sending in data!

Importing the Data

Now that our data sets are in order, it’s time to begin uploading the data to Google Analytics. Remember the schema that Google Analytics provided us for a data sets? E.g., our Facebook Cost data set has this schema:

Once we have the data, we’ll need to format it to follow the schema provided by Google and then stuff it into a CSV file. Ultimately, the file should look something like this:

Or for our Salesforce data set:

To upload the data to Google Analytics, simply POST your CSV to:

https://www.googleapis.com/upload/analytics/v3/management/accounts/accountId/webproperties/webPropertyId/customDataSources/customDataSourceId/uploads

Use the MIME Type application/octet-stream, where the webPropertyId is our Property ID (a.k.a. UA number or Tracking ID), the accountId is the middle numbers of our UA number (e.g. UA-XXXXXX-YY), and the customDataSourceId is our Data Source ID from earlier. Once we’ve posted our data, Google Analytics will return an uploads resource with data specific to your freshly uploaded data set, or any errors that occurred.

A few quick notes on limits – in addition to the normal Google API restrictions, you’re only able to create 50 data sets per property, and upload 50 CSVs of data per day, of a max of 1GB in size. You’re also limited to importing 100MB of data per ga:date value in Cost Data imports.

Using The Data in Reports

Once you’ve uploaded your cost data, you should be able to see the results in the Cost Analysis report, as well as in the MCF reports. You can also create a Custom Report to analyze your shiny new cost data.

Your Salesforce data will behave a little differently; Google Analytics will populate the data you’ve provided in association with your Client IDs you specified only after a session from said Client ID is recorded. So, if your visitor never returns after submitting a Salesforce lead, their Salesforce ID will not show up in the reports.

This applies to other keys, too; your data will not be populated retroactively. For this reason, it’s often a better idea to use a non-interaction Event via the Measurement Protocol in order to send in custom data to Google Analytics.

What data sources would you like to import into Google Analytics? Share in the comments below.

Dan Wilkerson is a Software Engineer at LunaMetrics. He is passionate about web technology, measurement, and analysis. Dan is the winner of the 1999 Forge Road Elementary School Science Fair for his groundbreaking report on how magnets work. (ICP, take note.) Dan has worked at LunaMetrics in social media, as our marketing manager, and now in our analytics department.

  • Vignesh

    Is it possible to upload Industry data instead of SalesforceID by following the same steps?

    • Dan Wilkerson

      Hi Vignesh – see above 🙂

      Dan

  • Vignesh

    Hey Dan, Great post.
    I was wondering if it is possible to upload Industry data instead of salesforceID by following the same steps?

    • Dan Wilkerson

      Hi Vignesh,

      Absolutely; the only bounds you’ve got are the models that GA imposes on the data and that Salesforce imposes on their data; just negotiate those and you’ll be set. In your example, Industry maps nicely to a User-level Custom Dimension in Google Analytics; I’d create a User-level CD named ‘Industry’, then extract the requisite data from Salesforce (client ID, industry), and then upload that data as part of my data import.

      [Edit] You also might like to pursue one of the other strategies I outlined in my post about Salesforce integrations with Google Analytics:
      http://www.lunametrics.com/blog/2016/02/01/integrating-google-analytics-and-salesforce/

      Dan

      • Vignesh Sairam

        Hey Dan,

        Sorry for the late reply. Thanks for the suggestion.
        I tried importing Industry data with SalesforceID CD as the unique key. The import was successful but I was not able to find the data in Custom Reports with SalesforceID & Industry data.
        Will the Industry data be mapped retroactively to the SalesforceID?

        • Dan Wilkerson

          Hi Vignesh,

          The data will not be populated retroactively – the only “exception” to this is Cost Data (and Query Time Data Import, if you have Google Analytics 360). Since you’re using Processing Time Data Import, you’ll see the data appear in the reports whenever a user who has been identified by your Salesforce ID / Client ID combination next returns to your site. Hence:

          So, if your visitor never returns after submitting a Salesforce lead, their Salesforce ID will not show up in the reports.

          Hope this helps!

          Dan

          • Vignesh Sairam

            Yes Dan it helps. Thanks for the quick reply. 🙂

  • https://www.cucina.li Andrea Rapanaro

    Hi Dan, thanks for the nice post. I have some questions around both topics.
    Considering that old hits are not impacted, how do you get a realistic picture considering that you usually know the cost data after your campaign has run… do you upload an estimation of it based on your daily budget before the campaign starts?
    Regarding the ID: do you upload it as soon as you match client ID and salesforce ID? What happens when the client ID changes / cookies are deleted? Any best practices to recommend in that area?

    • Dan Wilkerson

      Hi Andrea!

      Actually, your cost data will fill in retroactively – cost data sets behave a little differently than the other data set types. Once you’ve uploaded it, you’ll see the data appear for the dates you specify in your upload.

      For Salesforce, it comes down to how you want to model your data. Typically, you store your client ID with your lead form, then match your lead form to your Contact, which in turn returns a Salesforce ID. It all boils down to how you’d like to model the data. You can upload your Salesforce ID as it happens, but you’ll want to use a different method for that; check out my post on integrating Salesforce and Google Analytics for some ideas:

      http://www.lunametrics.com/blog/2016/02/01/integrating-google-analytics-and-salesforce/

      If the client ID changes/cookies are deleted, there’s not much you can do. When/if the user submits another Salesforce lead form, though, you should be able to retrieve the same Salesforce ID.

      Dan

      • https://www.cucina.li Andrea Rapanaro

        Thanks for the reply, I’ll check your other post!

Contact Us.

LunaMetrics

24 S. 18th Street, Suite 100,
Pittsburgh, PA 15203

Follow Us

1.877.220.LUNA

1.412.381.5500

getinfo@lunametrics.com

Questions?
We'll get back to you
in ONE business day.