SEO for PDFs
As my partner in crime Travis recently pointed out, misconceptions abound in the SEO industry. Here’s another misconception: “PDF pages are so SEO-unfriendly that you can’t rank for any halfway competitive keywords with them“.
Some SEOs are still so set against the Portable Document Format pages that they don’t feel they should even be landing pages. Some such SEOs recommend replacing all PDFs with HTML pages or building additional HTML landing pages targeting the same keywords as the PDFs.
The truth is: the biggest reason PDF pages often rank so horribly is that they are rarely properly optimized.
Don’t get me wrong. In an overall SEO showdown, I’d still pick HTML over PDFs any day of the week, and you’re not likely to catch me creating brand new web content for my clients in Adobe Acrobat. The real reason HTML is SEO-superior in 2013 is the user-experience. Most people are more comfortable with HTML and experience less freezing and slow loading with HTML. It’s easier to incorporate interactivity and social functionality into HTML pages. People also link to HTML pages and share them more frequently than PDFs(this is big).
Why Use PDFs then?
Don’t get me wrong twice — there’s still reasons to keep PDFs as SEO landing pages. Below are a few common use cases:
- When you already have many PDF pages on your site that people consider valuable. Before replacing PDFs, be sure to check to see if your PDFs have backlinks decent engagement metrics, and good traffic.
- When you have really sexy PDF’s that would be difficult to turn into an equivalently sexy and user-friendly HTML.
- When you have content that is meant to be printed or downloaded, like spec sheets, MSDSs, product manuals, brochures, forms meant to be printed and filled by hand, etc…
- When the cost-benefit ratio just isn’t in favor of replacing PDFs. This might be the case if you only have a few PDFs and you don’t want to spend the upfront time or money converting the pages into HTML and redirecting the URLs. (That said, a good PDF-to-HTML converter may be worth the investment if you’ve got a lot of un-uploaded PDFs laying around.)
The Best Practices in SEO for PDF Files
The big myth that search engines can’t digest PDF content used to be the case years ago, but the search engines have come a long way, baby. So if you have reason to stick with your PDFs, just follow the simple tips below. I’ve listed the important stuff first.
Always use text-based PDFs
Search engines understand text waaay better than images (though the engines do have rudimentary optical character recognition capabilities), so make sure the words in your PDF are basic copy-and-paste-able text, not pictures of words. Most of the big PDF creators, like those in Adobe Creative Suite, have your back here. If you happen to have a scanned document you want to turn into a solid SEO landing page, you’ll need to use a little OCR yourself and convert the document into text.
Set your title in the document properties
This is such a common and easy-to-fix error that it drives me crazy. It’s common knowledge that the title tag is a huge ranking factor. To do this to a PDF, one must set the title in the document properties. Almost all PDF creators support this functionality including Adobe applications such as InDesign. Per usual, you want to smartly utilize keywords and optimize your title tags.
Set an SEO-friendly URL/filename
Typically, the PDF filename will become part of the URL, so give your document a good key-word rich filename. Often, search engines use the filename/URL snippet for the title tag when the title is not set. Also, some document creators will default the title as the filename. So please set a descriptive title and filename. I’m sick of search results that look like this:
Do good SEO
What do keyword-rich title tags and descriptive URLs have in common, besides being PDF SEO best practices? They also follow standard SEO best practices. Follow your other usual basic SEO best practices to optimize your PDFs as well. This includes:
- internal linking to the PDF page to give it some link juice and authority (I see high-potential PDFs unnecessarily buried too deep in many websites). Speaking of internal linking and common pitfalls, please link from your PDF page to your other pages when relevant. It helps your SEO efforts and the user experience, and it isn’t done enough (seriously, I cringe when I have to copy and paste a URL from a PDF into the browser).
- good keyword selection
- keywords in body copy
- image optimization (note: you can set alt text in many PDF tools)
- human optimizing (Good content is good SEO, friend)
Keep the file size light
Huge sized files will load slower, affecting user experience and the search engines’ crawl. Adobe has the “PDF Optimizer” function which will allow you to reduce file size, and you’ll want to use it for heavy PDFs. Learn the nitty-gritty on reducing PDF file size here.
Avoid duplicate content
Having both HTML and PDF versions of the same content can sometimes be a wise choice, but only if you take measures to prevent the duplicate content issue. Also, if you tweak a PDF and re-upload it, don’t create a duplicate by accidentally changing the filename and change the URL.
Set the other document properties too
Hey, while you’re in there(setting the title)… you might as well complete the other properties such as Author, Subject, and Keywords. I couldn’t honestly tell you I know how much impact this will have, but I keep reading on the Internets that it’s worth it. So fill out all the properties you can — I just wouldn’t spend all day on it. Some sources say the Subject will become the Meta Description (but I have yet to verify this with much validity.)
Touchup the Reading Order
“Touchup” the Reading Order and set alternate text as well as headings. The headings are said to be handled by the search engines similarly to how header tags are handling in plain HTML.
Don’t save as the latest Acrobat version
Many readers might not have the latest Reader version (and no one wants to upload it just for your stupid page). Search engines sometimes fall behind the times too, so save your PDF in an older version.
Write-protect your document
If you don’t write-protect your document, then someone can upload the whole file to their site and change it however they want (including editing out your links.)
—-TL;DR—-
Ok, ok. Look, PDF SEO ain’t too hard. Just follow this checklist:
- Always use text-based PDFs
- Set your title in the document properties
- Set an SEO-friendly URL/filename
- Do good SEO
- Keep the file size light
- Avoid duplicate content
- Set the other document properties too
- Touchup the Reading Order
- Don’t save as the latest Acrobat version
- Write-protect your document
Let me know in the comments if I missed in any PDF SEO FAQs.
–TTFN.
About Reid Bandremer
Reid Bandremer is a Search Analyst. He brings strong analytical abilities, a penchant for strategy, and a robust business background highlighted by an MBA at Robert Morris and experience in eCommerce marketing. Contrary to popular theory, Reid is not homeless – he just likes staying at the office late because he is passionate about increasing organic search traffic to client’s sites.




Good article – something I actually needed this morning so it was nice that it was conveniently right on the front page of Inbound.
Re: duplicate content. Can you set the canonical for the HTML to be the PDF address? I don’t even know if that will work? rel canonical=”yoursite.com/this.pdf”? Would Google acknowledge that? Just something I don’t think I’ve ever tried.
Great question Matt. A question I don’t know the answer to for sure. I’ve yet in my research found reason to believe you can’t point the rel=”canonical” to the pdf, but I have yet to confirm you can do it, either. I can only say for sure that you can do it the other way around: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066#5. Sorry I can’t give you a more definitive answer; please let me know if you find one.
Also, thanks for the heads up on Inbound.org!
Good one Reid. If one uses a rel=”canonical” he can only set the pdf version as the canonical or the preferred version from the html page and not the html as a prefered version from a pdf, as I don’t think there are options to set canonical version (html page)in any of the pdf editors.
I think the best solution to avoid duplication would be just to link from the html page to the pdf saying “download the pdf version of this content here” or some thing which users can understand, Search engines are so intelligent these days that they can almost easily understand all the stuff that the users can understand.
Great post. Most of the people doesn’t know how PDF files can support or bring down their SEO work (especially because of the duplicate content, keyword stuffing etc.). Thanks, Reid.
Thanks you for the comment Mikolaj!
Sahil,
Actually, you can set the html page as the target of rel=”canonical”, but you need server access to adjust the .htaccess file. See http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066#5 and http://www.seomoz.org/blog/how-to-advanced-relcanonical-http-headers.
One big question though has always been how do you (easily) prove that Google is driving traffic to the PDFs when the clients’ measurement tool (often Google Analytics) doesn’t show the PDF downloads?
Without proof of actual traffic many times a client will prefer less actual traffic in favor of more tracked traffic.
You’re right. You can not track visits to a PDF landing page in Google Analytics to my knowledge. Thanks for pointing that out. In fact, it may indeed be the biggest challenge to getting more SEO love to oft-neglected PDF pages. One thing you can do (which is no substitute for full GA data) is if your pdf links to a conversion-page (for example, your PDF whitepaper has a call-to-action linking to a request-a-trial page), you can campaign tag the URL in the link and track the qualified traffic sent from the PDF.
Reid,
Thank you for your prompt, and helpful response.
It is much appreciated.
Even if you can not track the number of views on pdf, you can put the tracking code to count clicks on button/link from your site which leads to your pdf. But the problem wit tracking is not resolved as you cant know how many people visit it from other sites (with external link).
Correct Roman. Thanks.
Hi Reid,
Great Article!!! Thanks for sharing…
This seems to be a vital information for me and this will surely help me to clarify this issues with my clients. They always ask me whether HTML pages are helpful or some PDFs for respective pages.
Thanks once again…