Upcoming LunaMetrics Seminars
Washington DC, Sep 22-26 Boston, Oct 6-10 Chicago, Oct 20-24 Seattle, Nov 3-7

SEO for PDFs

As my partner in crime Travis recently pointed out, misconceptions abound in the SEO industry. Here’s another misconception: “PDF pages are so SEO-unfriendly that you can’t rank for any halfway competitive keywords with them“.

Some SEOs are still so set against the Portable Document Format pages that they don’t feel they should even be landing pages. Some such SEOs recommend replacing all PDFs with HTML pages or building additional HTML landing pages targeting the same keywords as the PDFs.

The truth is: the biggest reason PDF pages often rank so horribly is that they are rarely properly optimized.

Don’t get me wrong. In an overall SEO showdown, I’d still pick HTML over PDFs any day of the week, and you’re not likely to catch me creating brand new web content for my clients in Adobe Acrobat. The real reason HTML is SEO-superior in 2013 is the user-experience. Most people are more comfortable with HTML and experience less freezing and slow loading with HTML. It’s easier to incorporate interactivity and social functionality into HTML pages. People also link to HTML pages and share them more frequently than PDFs(this is big).

Why Use PDFs then?

Don’t get me wrong twice — there’s still reasons to keep PDFs as SEO landing pages. Below are a few common use cases:

  • When you already have many PDF pages on your site that people consider valuable.  Before replacing PDFs, be sure to check to see if your PDFs have backlinks decent engagement metrics, and good traffic.
  • When you have really sexy PDF’s that would be difficult to turn into an equivalently sexy and user-friendly HTML.
  • When you have content that is meant to be printed or downloaded, like spec sheets, MSDSs, product manuals, brochures, forms meant to be printed and filled by hand, etc…
  • When the cost-benefit ratio just isn’t in favor of replacing PDFs. This might be the case if you only have a few PDFs and you don’t want to spend the upfront time or money converting the pages into HTML  and redirecting the URLs. (That said, a good PDF-to-HTML converter may be worth the investment if you’ve got a lot of un-uploaded PDFs laying around.)

The Best Practices in SEO for PDF Files

The big myth that search engines can’t digest PDF content used to be the case years ago, but the search engines have come a long way, baby. So if you have reason to stick with your PDFs, just follow the simple tips below. I’ve listed the important stuff first.

Always use text-based PDFs

Search engines understand text waaay better than images (though the engines do have rudimentary optical character recognition capabilities), so make sure the words in your PDF are basic copy-and-paste-able text, not pictures of words. Most of the big PDF creators, like those in Adobe Creative Suite, have your back here. If you happen to have a scanned document you want to turn into a solid SEO landing page, you’ll need to use a little OCR yourself and convert the document into text.

Set your title in the document properties

This is such a common and easy-to-fix error that it drives me crazy. It’s common knowledge that the title tag is a huge ranking factor. To do this to a PDF, one must set the title in the document properties. Almost all PDF creators support this functionality including Adobe applications such as InDesign. Per usual, you want to smartly utilize keywords and optimize your title tags.PDF Document Properties

 Set an SEO-friendly URL/filename

Typically, the PDF filename will become part of the URL, so give your document a good key-word rich filename. Often, search engines use the filename/URL snippet for the title tag when the title is not set. Also, some document creators will default the title as the filename. So please set a descriptive title and filename. I’m sick of search results that look like this:

Crappy PDF Title in SERPs

Do good SEO

What do keyword-rich title tags and descriptive URLs have in common, besides being PDF SEO best practices? They also follow standard SEO best practices. Follow your other usual basic SEO best practices to optimize your PDFs as well. This includes:

  • internal linking to the PDF page to give it some link juice and authority (I see high-potential PDFs unnecessarily buried too deep in many websites). Speaking of internal linking and common pitfalls, please link from your PDF page to your other pages when relevant. It helps your SEO efforts and the user experience, and it isn’t done enough (seriously, I cringe when I have to copy and paste a URL from a PDF into the browser).
  • good keyword selection
  • keywords in body copy
  • image optimization (note: you can set alt text in many PDF tools)
  • human optimizing (Good content is good SEO, friend)

Keep the file size light

Huge sized files will load slower, affecting user experience and the search engines’ crawl. Adobe has the “PDF Optimizer” function which will allow you to reduce file size, and you’ll want to use it for heavy PDFs. Learn the nitty-gritty on reducing PDF file size here.

Avoid duplicate content

Having both HTML and PDF versions of the same content can sometimes be a wise choice, but only if you take measures to prevent the duplicate content issue. Also, if you tweak a PDF and re-upload it, don’t create a duplicate by accidentally changing the filename and change the URL.

Set the other document properties too

Hey, while you’re in there(setting the title)… you might as well complete the other properties such as  Author, Subject, and Keywords. I couldn’t honestly tell you I know how much impact this will have, but I keep reading on the Internets that it’s worth it. So fill out all the properties you can — I just wouldn’t spend all day on it. Some sources say the Subject will become the Meta Description (but I have yet to verify this with much validity.)

Touchup the Reading Order

“Touchup” the  Reading Order and set alternate text as well as headings. The headings are said to be handled by the search engines similarly to how header tags are handling in plain HTML.

Don’t save as the latest Acrobat version

Many readers might not have the latest Reader version (and no one wants to upload it just for your stupid page). Search engines sometimes fall behind the times too, so save your PDF in an older version.

Write-protect your document

If you don’t write-protect your document, then someone can upload the whole file to their site and change it however they want (including editing out your links.)

—-TL;DR—-

Ok, ok. Look, PDF SEO ain’t too hard. Just follow this checklist:

  • Always use text-based PDFs
  • Set your title in the document properties
  • Set an SEO-friendly URL/filename
  • Do good SEO
  • Keep the file size light
  • Avoid duplicate content
  • Set the other document properties too
  • Touchup the Reading Order
  • Don’t save as the latest Acrobat version
  • Write-protect your document

Let me know in the comments if I missed in any PDF SEO FAQs.

–TTFN.

Reid Bandremer

About Reid Bandremer

Reid Bandremer is an Senior Search Project Manager. His background before joining LunaMetrics in 2011 includes eCommerce marketing experience and a pair of business degrees. He is a rabid fan of music and holistic, ROI-driven search marketing strategy. Other strengths include organic search marketing segmentation, migrations, and metrics. Contrary to popular theory, Reid is not homeless – he just often stays at the office late because he is obsessed with increasing traffic value to clients’ sites.

http://www.lunametrics.com/blog/2013/01/10/seo-pdfs/

34 Responses to “SEO for PDFs”

Good article – something I actually needed this morning so it was nice that it was conveniently right on the front page of Inbound. :)

Re: duplicate content. Can you set the canonical for the HTML to be the PDF address? I don’t even know if that will work? rel canonical=”yoursite.com/this.pdf”? Would Google acknowledge that? Just something I don’t think I’ve ever tried.

Reid Bandremer Reid Bandremer says:

Great question Matt. A question I don’t know the answer to for sure. I’ve yet in my research found reason to believe you can’t point the rel=”canonical” to the pdf, but I have yet to confirm you can do it, either. I can only say for sure that you can do it the other way around: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066#5. Sorry I can’t give you a more definitive answer; please let me know if you find one.

Also, thanks for the heads up on Inbound.org!

Sahil says:

Good one Reid. If one uses a rel=”canonical” he can only set the pdf version as the canonical or the preferred version from the html page and not the html as a prefered version from a pdf, as I don’t think there are options to set canonical version (html page)in any of the pdf editors.

I think the best solution to avoid duplication would be just to link from the html page to the pdf saying “download the pdf version of this content here” or some thing which users can understand, Search engines are so intelligent these days that they can almost easily understand all the stuff that the users can understand.

Great post. Most of the people doesn’t know how PDF files can support or bring down their SEO work (especially because of the duplicate content, keyword stuffing etc.). Thanks, Reid.

Reid Bandremer Reid Bandremer says:

Thanks you for the comment Mikolaj!

Reid Bandremer Reid Bandremer says:

Sahil,

Actually, you can set the html page as the target of rel=”canonical”, but you need server access to adjust the .htaccess file. See http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066#5 and http://www.seomoz.org/blog/how-to-advanced-relcanonical-http-headers.

Brian Makas says:

One big question though has always been how do you (easily) prove that Google is driving traffic to the PDFs when the clients’ measurement tool (often Google Analytics) doesn’t show the PDF downloads?

Without proof of actual traffic many times a client will prefer less actual traffic in favor of more tracked traffic.

Reid Bandremer Reid Bandremer says:

You’re right. You can not track visits to a PDF landing page in Google Analytics to my knowledge. Thanks for pointing that out. In fact, it may indeed be the biggest challenge to getting more SEO love to oft-neglected PDF pages. One thing you can do (which is no substitute for full GA data) is if your pdf links to a conversion-page (for example, your PDF whitepaper has a call-to-action linking to a request-a-trial page), you can campaign tag the URL in the link and track the qualified traffic sent from the PDF.

Teresa C says:

Reid,

Thank you for your prompt, and helpful response.

It is much appreciated.

Even if you can not track the number of views on pdf, you can put the tracking code to count clicks on button/link from your site which leads to your pdf. But the problem wit tracking is not resolved as you cant know how many people visit it from other sites (with external link).

Reid Bandremer Reid Bandremer says:

Correct Roman. Thanks.

Guaranteed SEO says:

Hi Reid,

Great Article!!! Thanks for sharing…

This seems to be a vital information for me and this will surely help me to clarify this issues with my clients. They always ask me whether HTML pages are helpful or some PDFs for respective pages.

Thanks once again…

tom says:

Question, we sell a pdf product which often ends up on customer web sites. I’m thinking we could use these to generate back links but would we potentially suffer from over optimisation penalties unless we varied the backlink anchor text? What do think?

Reid Bandremer Reid Bandremer says:

Hard to say Tom,

Seems like the backlink here would be treated as if it were from an infographic or widget. Some could disagree with me, but I think it is possible to trigger a penalty, especially if: 1) it’s the same piece of content (rather than different kinds of products) 2)it’s found on many low-authority or untrusted or dodgy sites 3) the anchor text is the same for each link 4) the anchor text is keyword-rich (rather than generic or branded anchor text). Also, Google may simply put less value into those links than it would others types. I think the more of those factors you can reduce the more value and lower risk you’d have with this backlink tactic. But if you used branded (like “buyaplan.com”) or generic anchor text (like “here”), then there’s less need for varying anchor text.

Peter Demel says:

Very interesting post.
I did not know that you have an extra SEO for PDFs. Now I know it. Thank you.

Patricia Henkel says:

Great article. Lots of helpful tips. I would like more control over what Google selects for the snippet on SERPs. It appears that Google will scrape the site and produce text from the PDF that matches the terms that were searched. But I sure wish I could get Google to use what I have composed in the Description tag in Doc Properties. Any suggestions? Or any other way(s) to control this?

Reid Bandremer Reid Bandremer says:

Patricia, you’ve kind of stumped me. I’ve read that the subject can serve as the Meta description(ie, the description in the SERP snippet), but I have no proof. Also, text near the top of the document is more likely to be used as the description, so if leading with an abstract or detailed subheading that summarizes the doc would be appropriate, maybe you can put that in, and there’d be a decent chance that becomes the SERP description.

Those are things you can try – but certainly no guarantees.

Patricia Henkel says:

Thanks Reid.

I have read that the Subject in the Document Properties can serve as the SERP snippet, too, but have not actually seen that happen in practice (albeit on a really small testing group of PDF’s). I know that I have spent time adding tasty keyword descriptions to the Subject field in Document Properties, but so far I have not seen them show up in SERP’s. (But like I said, I haven’t tested a large group and it is also possible that it could take several weeks for the bots to catch onto our newly tagged PDF’s.)

I did look at some old, previously tagged PDFs that had the Subject property filled out, though, and I noticed that it was not picking up the Subject field as the snippet, either.

One article (http://www.jm-seo.org/seo-tutorial/adobe-pdf-seo.html) indicated that it was actually the Keywords in the Document Properties that would then show up in the description snippet in SERPs(!). The author of that article said that Keywords in Doc Properties was equivalent to the Meta Description field on HTML. Is this true? I tried to test for this by looking at old PDF’s that had both the Subject and Keyword fields filled in on Document Properties–but neither one was used for the snippet! Has anyone else tested or experienced this?

What I have seen, is that in a google search, it seems to scrape the whole PDF looking to grab a snippet of text that is close to the terms that were searched–even if those terms were in a footnote or at the bottom of the page. It seems logical that it would look for the best match at the top of the page, like you said Reid, but have not been able to confirm it.

Thoughts? Just trying to get an attractive description/snippet in SERPs for our PDFs. So it helps the user to decide whether it is worth the time to open up the PDF — and helps our (CTR) click through rates!

Thanks!

Reid Bandremer Reid Bandremer says:

As far as the article, I’m skeptical that the keywords in the doc properties would serve as the description(due to age of the age of the article, because it seems arbitrary, and because the author has no proof). It couldn’t hurt to test though. I’d look forward to hearing anyones results on that experiment.

(Also, I’d stay away from the author’s recommendation to create an html landing page for the pdf – thats not going to help in 2013.)

Descriptions are always tricky – even for html pages, you can’t force Google to adopt the Meta description 100% of the time. They will utilize Meta descriptions when they feel it is relevant. From what I’ve been able to tell, words at the top of the page are much more likely to be used as descriptions than words at the bottom. But the words must contain words the user is searching for.

Dan Craddock says:

Thanks Reid for a great article. Good to know that I’m doing the right thing with the PDFs (and Word docs) on our website!

It is a WCAG 2.0 AA requirement that we not only make PDFs accessible (tagging, set language, etc) but also provide any downloadable content in PDF (i.e. all PDF content) in an alternative, downloadable format such as Word or plain text (preferably Word, due to semantic structure etc).

So, my question is would the two versions of the same content trigger a penalty? Even if the Word document includes document properties and is optimised for accessibility and SEO?

Much obliged.

Reid Bandremer Reid Bandremer says:

Thanks for commenting Dan.

The short answer is “yeah – you’re going to want to set a canonical URL in the http header”

Long answer:
This would likely inhibit your SEO a bit, but it wouldn’t technically “trigger a penalty”. Technically, I only call something a penalty when Google inhibits the rankings power of your site due to a violation of quality guidelines, usually resulting in a sudden drop in traffic.

But having two versions of a page instead of one will constitute non-malicious duplicate content and carry the associated issues.

Sounds like you’ll want to set the PDF as the canonical version, and this will tell the search engines that the word doc is just a duplicate of the PDF. These links should tell you how:
https://support.google.com/webmasters/answer/139066?hl=en#5
http://moz.com/blog/how-to-advanced-relcanonical-http-headers

Dan Craddock says:

Thanks again, Reid; very helpful advice and links.

Dan.

Willem-Siebe says:

Hi, I have a excisting PDF document, this is scanned from a magazine. I can select the text (copy paste) so it was scanned ‘text based’. But with which program I can add title, author, subject and keywords?

Reid Bandremer Reid Bandremer says:

Hi William,

Most PDF creators/editors will let you do this. This includes Adobe and even some free PDF editors like http://www.pdfescape.com.

Paul Whelan says:

Great article, thanks! It’s very re-assuring to see that all my hard work and time spent optimising my PDF’s is worth it. I do everything on the list, except Write Protect the document, (and I often forget hyperlinks, so I cringed when you said that you hate when you have to copy and paste a link in the PDF, I promise to do better).
Adobe Acrobat could be a lot easier to use when using the Touch-Up Reading Order feature, I find that you have to save your work regularly in case Acrobat makes a mess of it. I sell hundreds of products and have decent PDF specifications for almost all of them. I know that PDF optimisation works beautifully because sometimes when I am in a hurry I will spend ages optimising the PDF, upload it and then forget to create an actual webpage for the product, but I still get calls for the product and people ask me how much the item is and then I realise I forgot to create the page! Another way is to Google specific phrases that you entered only in the PDF and you will see them listed in the Google results.

I don’t how what “rel canonical=” means, but I will look it up.
My website ranks No. 1 in Google for just about every product I do, so I’m obviously doing something right.

Reid Bandremer Reid Bandremer says:

Thanks Paul. Glad to see PDFs working for you.

2 suggestions on trackings results:
1. consider event tracking on links from PDF to the site
2. Check out http://www.lunametrics.com/blog/2013/06/04/tracking-pdfs-google-analytics-server-side/

Cheers,
Reid

Hey Reid,

Really nice post. But I got a question for you. Is there any site that you ‘d recommend for submitting PDFs only?

Reid Bandremer Reid Bandremer says:

Thanks Avinash. Can you clarify your question a bit?

Jerzy says:

Hey Reid,

Good post.I have one PDF file on website.is google passing juice with the links in my pdf? How can i can make these links nofollow?

Reid Bandremer Reid Bandremer says:

Hi Jerzy. Thanks. Sorry I’m just back to you; your question slipped past the goalie.

PDF links pass link equity. Unfortunately, I do not know of any way to make the links nofollow. I’m actually hard-pressed to come up with a good use-case where you’d really want to make the links nofollow, unless they’re linking to pages you do not want crawled. If that is the case(?), there may be other things you can do.

CMaddie says:

Hi Reid,

Great, informative article! Do you think submitting your PDFs to document sharing sites like Slideshare, scribd, etc. be considered duplicate content?

Reid Bandremer Reid Bandremer says:

Thanks CMaddie,

It technically is duplicate content, but it should only be an issue for you if the content on those sites comes up in the search results instead of yours. If your site is authoritative and the page you host the PDF is on, this shouldn’t be a problem. It is also less likely to be a problem if the PDF has been up on your site for a while before you submit it to those other sites.

Hope that helps.

nafew says:

Mr.Reid Bandremer
thanks for Good post & one nice PDF file on website.Thank you.
http://seofor24.wordpress.com

Reid Bandremer AW says:

Reid,

I enjoyed your article.
I was wondering if you had any ideas on how best to utilize the rel=”author” tag with PDF files. The normal method is to link to the author’s Google+ profile, but that obviously works with regular webpages.
Any help is appreciated.
-AW