5 Ways to Get Data on PDFs and Other Downloads/
December 9, 2013
One of the notable limitations of Google Analytics (GA) is that it does not provide data on non-html pages out-of-the-box. Thus, if your website has PDF files, Word docs, .wmv files, or other downloads, you’ll face a black hole of data.
But there are ways around that.
Recently, we started a project with a client that had a substantial portion of PDFs on their site. We went through our checklist for SEO for PDFs and determined the following:
- The PDFs were worth keeping in PDF format
- The PDFs needed SEO’d, including needing internal links to other pages on their site
- We lacked data on PDF usage to help our client determine what users were interested in
To the last point, because so many types of content (reports, magazine articles, studies, etc…) were in PDF form, the client really struggled to understand what content performed the best, making content strategizing extremely difficult. So we had to implement workarounds to obtain as much data as possible.
We’ve written about many applicable workarounds in the past, but today I want to get them together in one place for you for easy reference if you want data on your downloadables. So, using our PDF-focused project as an example, below are 5 ways to get data on non-html files.
1. Use Google Webmaster Tools data to examine Google clicks
There are a few ways to get straight to the non-html pages. In the screenshot above, I simply used “cntrl +F” to find anything on the screen containing “.pdf”. You could also export the data into Excel and separate out the non-html pages that way.
An even more user-friendly way to use GWT data is to incorporate it into GA, which is super easy. All it requires is for the admin of both GA and GWT to log into GA and, in the left nav, go to Acquisition -> Search Engine Optimization -> Landing Pages. If you’ve never connected GA and GWT you’ll see a screen that states “This report requires Webmaster Tools to be enabled.” Simply click the set-up button and follow the easy instructions.
Once you connect GWT and GA, you can see a report like this, which will enable you to easily look at types of organic landing pages.
One limitation of connecting the accounts is that you can only connect one GWT account to one GA account, and a GWT account can only be for one subdomain. This may be an issue if you have multiple subdomains.
2. Use Google AdWords’ Paid & Organic Report for additional data on Google clicks
Another limitation of GWT data on organic clicks is that it will only ever show 90 days worth of data. This is one reason I like to download GWT data every month.
If you didn’t save GWT data, you may be happy to find that Google Adwords’ Paid & Organic Report goes back further than 90 days. While not as robust a the Search Queries report, this Paid & Organic Report can help you fill in a few blanks. One caveat, however, is that this AdWords report only goes back to the date at which you enabled the report by connecting AdWords and GWT.
Fortunately, enabling this report is also super easy. Learn more about this report at www.lunametrics.com/blog/2013/09/23/find-organic-keyword-data-pdfs.
Another limitation of GWT search query data is that the click numbers are always rounded off (in an idiosyncratic way at that). Note that the AdWords’ Paid and Organic Report has no rounding.
3. Use event tracking in GA to see how often docs are downloaded from your site
So far, we’ve shown how to see tracking from Google to your non-html documents. But we assume you also want to know how often the pages are accessed from your site. Well we can do this using event tracking, and we can also see which documents were downloaded, and which html pages users were on when they accessed the documents.
It was when it came time to set up event tracking for every link to a PDF on a site that I truly realized Google Tag Manager was the greatest invention since the coffee machine, as we were able to set-up tracking for hundreds of links in just a few short minutes.
To learn how, go to www.lunametrics.com/blog/2013/10/03/google-tag-manager-auto-event-tracking. Note that Jonathan points out exactly how to track PDF link clicks in his section “Example:Tracking Link Clicks.”
Once that’s all set up you can get GA reports like this:
4. Campaign tag links inside documents to see how they send traffic to html pages
None of the PDFs for our PDF-heavy client had links back to other pages on the site. Albeit a common issue, this lack of internal linking from PDFs is a big site architecture no-no. From an SEO perspective, it meant that no link equity from the PDFs were passed on to the site. Lack of internal links is also detrimental to user-experience and conversion optimization. Thus, we had to edit every PDF so that it links back to the rest of the site.
When we added links to the PDFs, we coded them with campaign tags so we can have data on traffic from PDFs to the rest of the site. This enables us to gain insight on the quality of PDF traffic and the contribution of PDFs to website goals.
5. Track all hits to documents server-side
Using the above-mentioned 4 techniques,you can view data on Google organic clicks, data on how people accesses documents from your site, and data on how people access your html pages from your documents.
However, we still don’t know total “hits” to your documents. We don’t know how often non-Google, external traffic sources are sending visits to your non-html documents.
Alex Moore has a solution to this issue at: www.lunametrics.com/blog/2013/06/04/tracking-pdfs-google-analytics-server-side/. Basically, it is a PHP library that you integrate with Google Analytics using an .htaccess rule.
This solution will enable us to see inside Google Analytics how many times each PDF is downloaded (viewed). Note that only “hits” will be displayed, not visitors, bounce rate, time on page or other metrics. Note also that you’ll need to create a separate Google Analytics property to implement this solution, since it could interfere with data on your existing property. But you’ll finally get easy readings on total traffic in GA.
Just because a page isn’t in html doesn’t mean you can’t get data. With the techniques above, you can learn about:
- detailed data on how people reach your non-html documents from Google.
- detailed data on how people access non-html documents from your site.
- detailed data on visits from non-html documents to your html pages.
- total hits to non-html documents.