New sitemap features in robots.txt

Apr 11, 2007

Google, Microsoft and Yahoo just released the first result from the joint venture sitemap protocol. For a while now, you have been able to log in to Google’s webmaster tools and from there specify your sitemap XML file. By doing so, you tell Google the location of all the pages on your site so that you make sure that everything gets crawled and indexed.

Yahoo and Microsoft wanted the same functionality but instead of creating their own format, the joined forces with Google to create a standardized format. Now Ask.com announces they also will support the format.

The result is a set of extensions but most importantly the ability of autodiscovery, so you no longer have to manually log in and update the sitemap. Because all search engines support the robots.txt document, they decided to let you specify the path to the sitemap there.

Basically, you just add a single line to the robots.txt that specifies the absolute path to the sitemap. Here is an example from my robots.txt:

sitemap: http://blog.madskristensen.dk/sitemap.axd

If you use BlogEngine.NET, the sitemap is called /sitemap.axd.

* $4.95/month BlogEngine.net Hosting – Click Here!

Comments (3) -

Brian
Brian
4/14/2007 9:06:19 PM #

My only confusion stems from Yahoo and Google's difference in sitemap protocols. Have they joined forces on sitemaps as well, or does the dynamic sitemap need to differentiate the crawlers that are accessing it?

Mads Kristensen
Mads Kristensen Denmark
4/14/2007 11:26:20 PM #

Brian, the sitemaps from the biggest search engines will now support the same format as well as the same auto-discovery methods. It means that there is now only one format to produce. There is now differentiating between crawlers anymore.

Duncan Halley
Duncan Halley United Kingdom
4/15/2008 7:33:51 PM #

Hi,

I'm wondering how well the blogengine.net sitemap would live with my existing sitemap.

For example, my main website at duncanhalley.co.uk has a sitemap.txt file which has links to pages in my main website as well as in my blog.

If a spider came along and read the robots.txt, it would get directed to the blog sitemap file in /blog/sitemap.axd. Does this mean it would ignore the sitemap.txt in the root?

Does that make sense?

Pingbacks and trackbacks (3)+

Comments are closed

About the author

Mads Kristensen

Mads Kristensen
Program Manager at the Microsoft Web Platform team and founder of BlogEngine.NET.

More...

Month List

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.