Upcoming LunaMetrics Seminars
Pittsburgh, Jan 12-16 Boston, Jan 12-16 New York City, Jan 26-30 Denver, Feb 9-13

Building the Ultimate XML Sitemap

mapAn XML Sitemap is a sitemap created for search engines. The XML Sitemap is a listing of all the URLs on your site that you want search engines to crawl and index. The Sitemap also provides information on when pages get updated and how important they are. Search engines do not guarantee they will fully abide by the sitemap, but search engines do use XML Sitemaps for assistance in crawling the web.

Many webmasters and SEOs have reported improved traffic simply from submitting the Sitemap. In addition, the Sitemap can greatly assist in diagnosing indexation shortcomings. Submitting a proper, up-to-date XML sitemap, therefore, is an SEO best practice.

Building a Sitemap can be very easy or quite complex. The return on investment with Sitemaps varies tremendously from case to case; indeed many people can waste time getting bogged down in Sitemap technicalities if it is unwarranted. The amount time you put into Sitemaps should depend on your needs. This guide focuses on thinking critically about how utilize Sitemaps as an SEO tool to address your specific needs of your website and SEO campaigns.

There are 10 steps to building the Ultimate XML Sitemap:
1. Understand XML Sitemaps
2. Identify what types of Sitemaps you need
3. Pick Sitemap Generation Method
4. Figure out Sitemap content and structure
5. Build Sitemap
6. Check and validate Sitemap
7. Submit Sitemap
8. Check & Monitor Sitemap
9. Learn and Act
10. Rinse and Repeat Steps 8 and 9

Step 1: Understand XML Sitemaps

While you won’t need to hand-code XMl Sitemaps, I really recommend you get a basic understanding of how search engines use them, the protocol for Sitemaps, size limits, and the tags used in Sitemaps. Get it straight from the horse’s mouth by checking out Google’s section on Sitemaps.

Step 2: Identify what types of Sitemaps you need

In addition to the standard XML Sitemap, there is a Sitemap index and four more specialized sitemaps (the code search sitemap is now basically useless since Google Code Search has been deprecated this year.) If you want to improve traffic to videos, images, your mobile site, or news articles, use specialized Sitemaps (Sitemap extensions).

The 6 Types of Sitemaps:

  • Video
  • Images
  • Mobile
  • News
  • Sitemap Index if you have multiple sitemaps
  • Standard Sitemap

Step 3: Pick Sitemap generation method

There’s a few ways to build Sitemaps and tons of tools to help do so.

Needs analysis:
First, perform some needs analysis to figure out the best way for you to go about building the Sitemap by asking the following questions:

  • What CMS do you have? (There are many generators specifically made for certain CMSs, like WordPress.)
  • Approximately how many pages will you be submitting? (Unless your Site is really tiny, you’ll want a Sitemap generator. Also, most free generators will only crawl up to a certain number of pages)
  • Do you suspect you are having notable issues with the search engines properly crawling and indexing your site?
  • Do you want a simple solution so you can set and forget? OR…,
  • Do you want to really dig in and optimize indexation?

At this point, you should have a good idea how much effort you should put into your Sitemaps.

Find a Sitemap generator:
Unless you have a very small site and/or a desire to hand code the Sitemap, you’re probably going to use a Sitemap generator tool to build the Sitemap file. The generator will look out the pages in your site and list them on the Sitemap, according to XML Sitemap protocol and how you configure the generator.

There are basically two types of generators — those that crawl your site (like Googlebot does), and those that look at your site from the back-end. I find that CMS-specific back-end based generators usually make life easier. (That said, I do highly recommend running a crawler on your site from time to time to see all existing URLs, broken links, http status codes, and such.)

Here is a list of Sitemap generators. You may also want to perform a separate search for generators specifically for your CMS.

Sitemap generator selection criteria: Pick a generator that:

  • will generate the types of Sitemaps you need.
  • fits your needs identified in needs analysis.
  • updates dynamically. Updating static Sitemaps is usually an unnecessary and undesirable chore.
  • set the sitemap tags for individual pages and groups of pages
  • break up sitemaps into pieces however you want
  • *Bonus: find orphan pages (pages that do not have any pages linking to them)

Step 4: Figure out the Sitemap content and structure

Next, we need to figure out which URLs are going into the Sitemap, and, if your site is large, which URLs go into which Sitemaps.

Sitemap content:
To decide which URLs to include in your Sitemap, you need to figure which pages you want the search engines to crawl and index. Remember, we’re only going to list one URL for each page. We also will leave out pages that should be private.

Sitemap structure:
You may need multiple Sitemaps files if:

  • You want special sitemaps for specialized content (images, videos, etc…)
  • You suspect certain sections of your site are at risk for indexation shortcomings and you’ll want to analyze those sections
  • You have a large site (each Sitemap file can only be up to 50,000 URLs or 50MB)

When deciding how to structure Sitemap files of the same type, section the Sitemaps in terms of what is most helpful in diagnosing indexation issues. Ask yourself 2 questions:

  • What parts of your site are not getting indexed that should be?
  • What pages are not getting indexed frequently enough?

Here is a great article on structuring Sitemaps to assist in diagnosing indexation issues.

Step 5: Build Sitemap

You can either let the generator go and do its thing or you can tweak settings to generate the Sitemap that shows the engines exactly how you want your site crawled.

Things to tweak in your Sitemap:

  • Sitemap tags
  • Sitemaps segmentation — divvy up individual Sitemaps by type and by a structure that will best help you diagnose indexation shortcomings. Give them descriptive names as well.
  • Exclude URLs that should NOT be indexed
    • Exclude URLS disallowed in robots.txt (good time to make sure you’re disallowing the right urls)
    • Exclude URLs disallowed via meta noindex tags
    • Exclude duplicate URLS
    • Exclude private pages

Uploading Sitemap:
After you run the sitemap, you will upload it to your site, ideally at the root directory like so: www.example.com/sitemap.xml. Technically, you don’t have to place it at the root, but there will be some limitations.

XML Sitemap and Source Code

**Congratulations! If you’ve completed Step 5, you’ve now built your XML Sitemap. However, the next steps will ensure you get the most out of your Sitemap. Let’s proceed…

Step 6: Check and validate Sitemap

Now, it’s time to make sure the Sitemap follows protocol and says what you want it to say.

Validation:
There are a few tools to validate that your Sitemap follows protocol and will be fully useable by search engines. Since we should be submitting the Sitemaps to Google Webmaster Tools anyways, I like to use this. If you don’t use Google Webmaster Tools, now is the time to start; GWT may be second only to Google Analytics in terms of being the best free tool for webmasters. To test your sitemap, simply go into Google Webmaster Tools, and click the “Sitemaps” link under “Optimization” in the left navigation section of your Add Test Sitemapsite’s dashboard. Then click the big red button and test away. If you’re feeling warm and fuzzy inside, then I presume Google can read your Sitemap.

Manual check:
But wait! Just because your Sitemap follows protocol doesn’t mean it looks the way you want it to. Best to check the following…

  • Does it list the pages you want indexed?
  • Does it exclude pages you don’t want indexed? (like duplicate urls or private pages)
  • Do the tags describe the URLs like they should?

Step 7: Submit Sitemap

BWT Submit Sitemap

Once the Sitemap is all checked out, it’s time to make sure the engines know about it. I recommend submitting the Sitemap directly to Google and to Bing (which powers Yahoo). I recommend doing this through Google and Bing Webmaster Tools, because you should use those tools to analyze your Sitemaps anyways. (In Bing Webmaster Tools, go to your website dashboard and click the “Sitemaps” link in the “Configure My Site” dropdown in the left nav.)

Also, be sure to list your Sitemap in your robots.txt file to ensure it gets found by all search engines.

Step 8: Check & Monitor Sitemap

I highly dissuade you from setting and forgetting when it comes to your Sitemaps. There’s a lot of information you can get in Bing and Google Webmaster Tools (especially the latter). In fact, in many cases it is the diagnostic assistance which is the biggest benefit of Sitemaps.

Be sure to check on pages indexed versus URLs submitted. It’s rare that the number of pages indexed matches what you submitted, but something is amiss if there is a big difference in the numbers.

Also pay attention to Sitemap errors and warnings. The less errors and warnings you have, the more likely the search engines will listen well to your Sitemap. Sitemap errors and warnings may reveal problems with robots.txt, Meta robots tags, duplicate content , or other issues.

If you have a Sitemap that 100% accurately represents how you want your site crawled, then you can gain a ton of insight by comparing Sitemap stats to other data points. Consider cross referencing the following:

  • GWT Index Status ‘site:’ search:  Enter in Bing and Google site:www.yourwebsite/subdirectory to find what is indexed. This may not be 100% accurate but it can help the investigative process.
  • Analytics: See which pages received search engine visits in Google Analytics.
  • GWT Index Status:  Compare the stats on crawling and indexation, and pay attention to spikes in number of pages crawled and indexed.
  • BWT Site Activity: Compare the stats on crawling and indexation, and pay attention to spikes in number of pages crawled and indexed.

Step 9: Learn and Act

A perfect Sitemap can be your best tool in diagnosing indexation shortcomings. If you’re having problems diagnosing indexation issues, consider tweaking your Sitemap to perfection.

The goal is to get the right pages indexed, and new content on those pages indexed as soon as possible. Use the information from your Sitemap analysis to diagnosis issues with site structure, duplicate content, and internal linking to help you reach this goal.

Step 10: Rinse and Repeat Steps 8 and 9

Definitely keep checking in on those Sitemaps every once in a while to see if you’re having any addressable indexation issues. Be sure to check in after major updates to your site or any major traffic pattern changes. Remember, your XML Sitemap is a great tool, so use it.

Rinse and Repeat

Reid Bandremer

About Reid Bandremer

Reid Bandremer is a Senior Search Project Manager. His background before joining LunaMetrics in 2011 includes eCommerce marketing experience and a pair of business degrees. He is a rabid fan of data, music, and holistic ROI-driven search marketing strategy. Other strengths include SEO metrics, migrations, and searcher segmentation (keyword research 2.0). Contrary to popular theory, Reid is not homeless – he just often stays at the office late because he is obsessed with maximizing the value of clients’ search traffic.

http://www.lunametrics.com/blog/2012/12/13/building-ultimate-xml-sitemap/

14 Responses to “Building the Ultimate XML Sitemap”

ضى مصر says:

I want to create a large site map is larger than 500 link

As I understand from your article, I need to create a separate sitemap for the images in my website. In that case, shall I open a new folder in the root directory of the website and how should it be named to distinguish it from the pages sitemap?
Thanks,
Raffi

Reid Bandremer Reid Bandremer says:

Hi Raffi,

You can add image info to the standard Sitemap, or you can add a separate Sitemap just for images, as detailed at http://support.google.com/webmasters/bin/answer.py?hl=en&answer=178636. If you create an additional image Sitemap, you can name it whatever you’d like, as long as it is at the root directory, and you inform the search engines of the names of your Sitemaps, either in the robots.txt file, a Sitemap index, or by submitting it to Bing/Google Webmaster tools (i recommend all 3 methods for multiple Sitemaps). Protocol on Sitemaps, including location, here: http://www.sitemaps.org/protocol.html

Will1968 says:

Can you recommend a dynamic sitemap creator that works with asp.net sites?

Will1968 says:

Also does the order of the pages on the sitemap have any relevance? I was sort of wondering if I should have the important pages at the top. My head says it makes no difference.

Reid Bandremer Reid Bandremer says:

Will,

I’ve never used an asp.net-specific dynamic Sitemap generator so I suppose I cannot recommend. However, this one was located on Google’s XMl Sitemap list and there is another one that may be what you’re looking for.

Perhaps you can test those out.

Reid Bandremer Reid Bandremer says:

Good question, Will. The answer appears to be it “likely makes no difference” as seen at http://www.sitemaps.org/faq.html#faq_url_position. What I think is that the order of the URLs never matters except in one possible condition – where the spider hits the crawl budget and must stop crawling your Site before it is complete. That scenario might occur for sites with index bloat and URLs at the end may be crawled less frequently – note that this is just my educated guess.

Also, note that you can indicate priority of a page with the tag (http://www.sitemaps.org/protocol.html#prioritydef).

alex says:

Oh, really i want to submit sitemap, this article is helpful to understand what really i need to do. But I have some query how can submit mobile version website sitemap.

Reid Bandremer Reid Bandremer says:

Alex,
Check out http://support.google.com/webmasters/bin/answer.py?hl=en&answer=34648
You can put it up at the mobile subdomain’s root directory (for example: m.example.com/Sitemap) submit it to Google/Bing Webmaster Tools, and include it in the Sitemap index file.

Muhammad Wajid says:

i need sitemap with unlimited links
is there any let me know

Muhammad Wajid says:

how can we edit a site map is there any tool
how to create txt site map

Antoine says:

Thank you for this link http://toolspot.org/xml-sitemap-generator.php I been looking for a sitemap generator that will pick up all of the pages for my website http://tova.co.za
I run my wordpress website of a database and it generates extensions like http://tova.co.za/venue-information/?venuesID=48 and for some reason I can’t figure out the sitemap generators don’t pick up these pages, maybe someone can explain why?

Reid Bandremer Reid Bandremer says:

I’m not sure why the Sitemap doesn’t catch those URLs with the VenuesID parameter. However, even if they did, and the search engine crawlers crawled those pages, they would probably not index them because the target of rel=canonical for those pages is http://tova.co.za/venue-information. So, first make sure the pages are indexable, then make sure they are in the Sitemaps. Maybe those issues are related…

Good luck Antoine