SEO Egghead by Jaimie Sirovich: A blog about SEO, written for nerds, by a nerd.

Choose a Topic:

» Suggest a topic or buzz to cover; if I write about it, you'll get credit with a link in the post!

Fri
21
Jul '06

Google Robots.txt Snafu: Part II

I decided that I would test what I think is an inconsistency in the interpretation of the robots.txt specification by various implementors cited here.

I created a robots.txt file for this site that is contrived to test how various spiders interpret the specification.  Here it is:

User-agent: *
Disallow: /blog/seo/automatically-highlighting-internal-links-p51.html
Disallow: /blog/seo/msn-search-p5.html
User-agent: googlebot
Disallow: /blog/seo/msn-search-p5.html
User-agent: msnbot
Disallow: /blog/seo/using-referers-http_referer-to-increase-conversions-and-perceived-relevance-p9.html
User-agent: slurp
Disallow: /blog/seo/yahoo-hostings-lack-of-htaccess-support-p8.html

Since I'd like to make sure all these pages are indexed well in the first place (and because I'm a link-whore), please link to this post for me :)

We already know that Google will only exclude the page on MSN search, but I'm curious how Yahoo and MSN will interpret the file.  In theory, according,  at least, to Google's interpretation, the "*" section should be completely ignored by all -- Google, Yahoo, and MSN; and effectively (unless you care about the other search engines), completely without effect here.

We'll know soon.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  E-Mail This Post/Page

One Response to “Google Robots.txt Snafu: Part II”

  1. SEO Egghead » Blog Archive » Google Robots.txt Snafu: Part III (Conclusion) Says:

    [...] We finally have a conclusion on how exactly to interpret a robot.txt file for the edge cases mentioned here.  Someone started a WebmasterWorld thread on the subject of contention. [...]

Leave a Reply