Easy Cohort Analysis for Blogs and Articles

/

blog-cohort-analysis

It’s now easier than ever to track and compare performance between articles and blogs. While Google Analytics shows you pageviews and other key metrics, frequent content comparisons are made difficult by the shifting time frames.

How can I compare a blog post that was published this month vs. a blog post that was posted last month? Sure, we can run two different reports, pull it into Excel and start crunching the numbers, but there’s gotta be a better way!

blog-cohort-applesEnter Cohort Analysis. You may have heard this term thrown around before, usually in relation to users on your site and when they first became users. The idea here is to group users or sessions into common groups, like who first visited in January or first-month visitors. Avinash and Justin Cutroni both love cohorts, so obviously we should, too!

In this case, we’re going to use Google Tag Manager to put content into cohorts so we can analyze how they performed in similar time frames. We’ll pass these into Google Analytics as Custom Dimensions so they’re available for analysis. It’s actually much easier than it sounds!

Step One: Find the Published/Posted Date

Like the title says, this post is really geared towards things that get published on a certain date. I originally started with blogs posts or articles in mind, but this could apply to other things that are published, for instance, if you’re a deal site and you have new deals go up each day, etc.

We need to find a way to get that publish date into Google Tag Manager. Here are three options, in order of my preference.

1. Data Layer
2. URL Structure
3. Element on Page

Data Layer

If you’re using Google Tag Manager, you likely love the data layer. Put all of your important information into one great place so GTM can snag that data and use it. Add this to your site if it’s not currently there, or ask your developer to help out. For us, it was as easy as adding the following PHP code to our Header template in WordPress.

The ultimate goal is to get the full date with time information on the data layer. If you have more questions about this step, check out this article that explains more about how the data layer works.

If you get this to the data layer, setting up the Data Layer Variable macro is pretty easy.

cohort-dlv-posteddate

URL Structure

If the data layer step is going to take too long or just isn’t technically possible, it’s time to start getting creative. Where else can you find the publish date? Check out our blog URL structure up in the address bar. For us, we actually have the date available! There’s no time available, but it’s better than nothing!

Our URLs look like this:

/blog/YEAR/MONTH/DAY/blogtitle/

We can use a Custom Javascript Macro to extract the date from the URL Path like the examples below.

cohort-customjs

 

Element on the Page

Lastly – I’ll mention using an Element on the page. Look to see if the date is somewhere that you can steal it from the page itself. Find the date and check to see if it’s wrapped in an element. You can right-click on the date and check to see if it’s wrapped in a span or html tag with a unique ID.

cohort-inspect-span-smaller

If you’re so fortunate to have this available to you, this DOM Element macro is pretty easy to set up as well!

cohort-html-publishdate

Alternately, try viewing your source code and doing a CTRL+F for variations of your publish date. It may appear in a hidden field or somewhere else on the page in a uniquely identified tag that you can use.

Note of Caution: There’s a reason this is my least preferred method. If you use a DOM Element, there’s a good chance it might be not be available when the Pageview Tag fires. Use the highest DOM element on the page that you can, but if that’s not working reliably, you may have to alter the Rule for your Pageview tag to wait until gtm.dom. Any time you delay your Pageview Tag, you may lose a few Pageviews, so keep this in mind!

Step Two: Calculating Days/Weeks/Months Since Posted

Now that we have the date that the blog/article was posted, we can quickly calculate numbers of days/weeks and… months? Perhaps.

Again, I’ll reemphasize that it’s best to have the time that the content was published AND the time zone! If your visitor is coming from a different time zone, we want to accurately count how long it has been since the content has actually been on the site.

Set Up Your Custom JavaScript Macros

Now that we have the publish date, let’s grab today’s date and take the difference.

cohort-daysSince2

daysSincePosted – I’m going to round up here, so the first 24 hours will count as Day 1, and so on.

weeksSincePosted – We’ll just take days and divide by 7. Again, we’ll start in Week 1.

monthsSincePosted – Months are really the toughest thing to do. Some months have 31 days, some have 30, February just hates consistency. If we’re talking about time passed, then months just don’t work well. My advice here is to just go with buckets of 30 days. bucketsOf30DaysSincePost doesn’t have quite the same ring though, so call it months and add an asterisk to your reports.

Step Three: Passing this Information In

Now that you have your Macros up and running, it’s time to pass these in as Custom Dimensions (Universal Analytics only). I created my Custom Dimensions in Property Settings and then added them onto the Pageview Tag under Custom Dimensions.

cohort-customdim

cohort-customdim-gtm

You’ll notice I also passed in the posted date. This is mostly for flexibility, just in case we need it for something else down the line!

Step Four: Custom Reports

Now that you have the info in Google Analytics, you can create all kinds of custom reports. Two simple custom reports can be set up like below, that use a longer time span but then only include data from an article’s first week or month.

cohort-customreport

Or, after enough time has passed, it will be easy to export the full list and pivot it into a triangle chart with blog title down the left side and week or month across the top.

cohort-triangledate

cohort-triangle

Note of caution: Because these are Dimensions and not Metrics, we won’t be able to do anything inside the Google Analytics interface that resembles Greater than or Less than selections. If you wanted to get everything before 60 days, you could use a regular expression like so ^[1-5]?[0-9]$, or always spit it into another program to crunch.

About

Jon Meck is our Technical Marketing Manager, promoting our services and trainings to the world. He has a jack-of-all-trades background, working for companies large and small in social media, website design and maintenance, and analytics. He is an Excel enthusiast, he loves efficiency, and he is strong proponent of the “Work Smarter, Not Harder” mantra. Jon is also the author of two number puzzle books.

  • http://roma.net.ua Roman Rybalchenko

    Thanks for interesting example of cohort analysis.

    Will it work with russian blogs when not AM / PM is echo’ed but ДП/ПП? If I change the output to:
    echo get_the_date(‘Y/m/d H:i:s T’)
    will it work?

  • http://www.lunametrics.com Jon Meck

    Hi Roman,

    I think that would work – using H would give you the 24 hour format of the hour and avoid the AM/PM issue. As always, I would recommend testing it before implementing it.

    If you’re familiar with JavaScript, I would say try adding in the post date via PHP and then taking that value into jsfiddle.net and see if you can get these functions to work properly.

    Let me know how it goes!

  • http://www.analyticsedge.com Mike Sullivan

    Jon, on your final pivot table, you should use maximum rather than sum for the grand total column, especially if you have a lot of periods of data. By using sum, a long-standing article with 50 pageviews per period will rank higher than a new one with 200 pageviews in its first period.

    I did like your concept, though, and I built my own solution, but relying on post-processing of historical data rather than playing with javascript up front. It is limited to monthly or maybe daily resolution if your blog article includes the day like yours does. Mine only has the year/month.
    analyticsedge.com/2014/08/cohort-analysis-blog-articles/

    Mike

    • http://www.lunametrics.com Jon Meck

      Hi Mike – thanks for the link back! I somewhat disagree about using Max instead of Sum, but it’s really based on what you’re trying to measure. The flip side is that a single post with a crazy spike will rank higher than valuable content with consistent traffic. In our case, we’re always looking for more consistent and repeat traffic rather than a flash in the pan.

      Post-processing is always an option, but especially in your case, will be deceiving. If your URL structure only includes year/month, then a blog post published at the beginning of the month will heavily outweigh an article published on the last day of the month.

      My solution proposed here is to set up good data going forward, and to hopefully enable simpler reporting in the GA interface.

      Thanks for the comment!
      -Jon

  • http://roma.net.ua Roman Rybalchenko

    I tried echo get_the_date(‘c’) and it is counting.

    But getting in google analytics debugger message:

    Expected a string value for field: “dimension2”. but found: “number”.

    Is it OK?

  • http://www.lunametrics.com Jon Meck

    Hi Roman – it will still work inside of Google Analytics! The debugger just gives you the warning, but everything will still work correctly.

    Thanks,
    -Jon

Contact Us.

LunaMetrics

24 S. 18th Street, Suite 100,
Pittsburgh, PA 15203

Follow Us

1.877.220.LUNA

1.412.381.5500

getinfo@lunametrics.com

Questions?
We'll get back to you
in ONE business day.