Tracking PDFs and Other Downloads Inside Google Analytics… Server-Side!
This is all fine and dandy – except that the world doesn’t always work that way. People sometimes hotlink to PDFs, Word docs and images and visit them directly. And thank goodness! Can you imagine a world without direct links to imgur.com memes?
In situations where visitors access a non-HTML resource directly, Google Analytics is not the tool for the job. An analyst would have to view raw server logs to determine how many times a PDF was requested. For example, we determined that nearly half of a particular client’s PDFs were downloaded directly from an email blast campaign. Since there was no visit to the website involved, Google Analytics was clueless. Yet important data was sitting idle inside server logs, buried and inaccessible.
Enter server-side Google Analytics. Note: This is a PHP-only solution. Conceptually this can work in other environments, but PHP/Apache is my flavor of choice.
By integrating with this library, we can 1) set an .htaccess rule to reroute all PDF downloads through 2) a custom download.php script, which hits the library and 3) fires off a Google Analytics call. You can keep your same folder structure and you won’t need to move any of your existing PDFs! And no additional cookies required! Let’s dig in.
Tools You Need:
- Apache server with PHP 5.3 or greater
- Notepad++, TextMate or a similar quality text editor
- FTP Client like FileZilla
- Google Analytics account
A Cautionary Word:
Download the php-ga library. Look inside the folder labeled src and move autoload.php and the GoogleAnalytics folder to your website’s root directory.
Create a new PHP file called download.php. This is where the magic happens. The script 1) loads up the php-ga library, 2) creates a new visitor hit to GA, 3) tracks a virtual pageview for the PDF, 4) uses cURL to set a custom user-agent called LunaMetrics123 (you’ll see why later on), and 5) fetches the PDF and sends it to the browser.
Try it out! Include the following code (be sure to add your GA-ID and domain below):
In the root of your website, open the .htaccess file. This is a special system file that may be hidden on some machines. You may need to create a new blank .htaccess file. Don’t forget the leading dot!
(If working locally on Mac OS X, you may not see hidden system files by default. This link can fix that.)
.htaccess provides instructions for the Apache server when a visitor tries to access files. Our job then is to intercept the request for a PDF and reroute it to download.php. Remember the LunaMetrics123 custom user-agent string earlier? That’s a handy hack to make sure that the server doesn’t enter an endless loop… without setting a cookie!
Inside .htaccess, include the following lines:
Once your code is in place, go to your website and try to download a PDF… directly. Don’t visit the website first. If you don’t have a PDF handy, here’s one.
Then… drumroll. Open up your Google Analytics and view the Real-Time reports.
Take a look! You’ll see that you should have an active page (the PDF):
This is just a start. There are so many things we can do with server-side Google Analytics. Why stop at PDF downloads? We could track direct image downloads, Word documents, 404 error pages, etc. Share some other uses in the comments below!