Put Your Blog Under the Microscope with SEO Crawling Tools/
June 18, 2014
In my role here at LunaMetrics I talk a lot about blogs. I love advising our clients on creating engaging blogs using unique, clever content. Our team here at LunaMetrics has been having lots of discussions lately about our own site. Everyone at our company contributes to our blog.
When I came across Matthew Barby’s awesome method for scraping websites to identify link prospects, I immediately wondered what trends I could identify from our own blog using this method. In this post, I’ll examine our blog with third-party tools to extract some actionable insights.
Scraping websites is an awesome way to collect data (provided you’re not violating anyone’s Terms of Service…). In this example I used Screaming Frog to crawl an entire website (Yours!) as well as SEO Tools for Excel to crawl elements of the site’s pages and an API to identify social shares. When it comes to competitor research or building a list of potential press outlets and authors to contact, this technique can’t be beaten because you don’t even need Google Analytics access to amass this data. We’re automatically taking it right from the page.
Here’s what I used to look at this data: Screaming Frog, SEO Tools Add-On for Microsoft Excel, Microsoft Excel. Read Matthew’s post for a full exploration of this method if you’d like to do content scraping for link opportunities or press research. Let’s turn our attention to scraping your blog to analyze your posts.
First I crawled our blog with Screaming Frog. You could also export a list of URLs from Google Analytics if you’d like. I was most intrigued by this method because it sidesteps GA and gives you some extra information. I then exported my crawl to excel and deleted all of the columns except for the following: URL, Wordcount, Title Length.
Sort by URL, then delete any non-blog URLs (keep the /blog/ URLs if that’s where your blog lives). For my test of this method, I just kept blog article URLs that were published in 2014 and put them in my first column. I made a second column called Category and manually tagged blog posts by topic: Analytics, SEO, PPC.
I really wanted to provide some insights on specific author stats. To do this I used the XPathOnURL function listed under Scraping on the SEO Tools ribbon. This will go out to the URL in our first column and comb the page for specific data. To establish what data we want to return, we need to add an XPath.
Sidenote: XPath is great for websites with uniform page structure, where the majority of the pages have the same elements in the same place, or are uniquely/consistently identified. Read more about using Xpath with webpages here.
To find the XPath, I right clicked the author’s name in the top section of a blog post, clicked Inspect Element and the used another right click to select Copy XPath then pasted this into your Excel cell and changed the double quotes to single quotes in the XPath portion of my formula.
The Xpath portion for our blog is [@id=”author’]. This will let us pull this repeated value from page to page. Yours might look different! Try doing this on two separate pages and make sure the Xpath is the same. If it’s not, you may need to look more into writing an Xpath expression.
My final Excel cell looked like this, where A2 refers to the specific page URL:
Social Shares & Actions were calculated with the SharedCount.com API. These fields will populate automatically based on your URL in Column 1 using the JSONPathOnURL function of SEO Tools, also located under Scraping. I made a new column and added a JSON command for each network I wanted to track. My Facebook Shares column pulled that data using this function:
The value in the formula may change depending on which network you want data from:
Reddit, StumbleUpon, Delicious, Diggs, Pinterest, Facebook.click_count, Facebook.comment_count, Facebook.like_count, Facebook.total_count, Twitter, LinkedIn, GooglePlusOne.
I also created a Total Social column that would sum all of my other social action columns. Finally, in another worksheet, I used COUNTIF and SUMIF to match author names and total social actions by number of posts and then made the graphs below.
This is a handy way to see some SEO-centric information about your blog with tools most SEOs likely already have installed. What did you learn? Are any ideal patterns emerging? Let me know in the comments below. I’m happy to share my spreadsheet if anyone wants to see it. Now, if you’ll excuse me, I need to go back and crawl our entire historical blog data.
Here’s what I learned about our blog half-way through 2014:
-Longer posts didn’t necessarily correlate with more shares (See Social Actions by Wordcount), which I didn’t expect.
-In terms of quantity, our posts get shared most on Twitter, followed by LinkedIn (surprise to me!) and then Facebook.
-I’d like to work on getting more Google+ shares for posts I author (Social Actions by Author).