I decided that I would test what I think is an inconsistency in the interpretation of the robots.txt specification by various implementors cited here.

I created a robots.txt file for this site that is contrived to test how various spiders interpret the specification.  Here it is:

User-agent: *
Disallow: /blog/seo/automatically-highlighting-internal-links-p51.html
Disallow: /blog/seo/msn-search-p5.html
User-agent: googlebot
Disallow: /blog/seo/msn-search-p5.html
User-agent: msnbot
Disallow: /blog/seo/using-referers-http_referer-to-increase-conversions-and-perceived-relevance-p9.html
User-agent: slurp
Disallow: /blog/seo/yahoo-hostings-lack-of-htaccess-support-p8.html

Since I'd like to make sure all these pages are indexed well in the first place (and because I'm a link-whore), please link to this post for me :)

We already know that Google will only exclude the page on MSN search, but I'm curious how Yahoo and MSN will interpret the file.  In theory, according,  at least, to Google's interpretation, the "*" section should be completely ignored by all — Google, Yahoo, and MSN; and effectively (unless you care about the other search engines), completely without effect here.

We'll know soon.

Tell an amigo:
  • Sphinn
  • Digg
  • Reddit
  • del.icio.us
  • StumbleUpon
  • Facebook



Related posts:
Google Robots.txt Snafu (Update) Some people may know about this already, but it's worth...
Google Robots.txt Snafu: Part III (Conclusion) We finally have a conclusion on how exactly to interpret...
Google's Borked Robots.txt I've never assumed that the "Allow:" directive was supported by...
Wildcard Robots.txt Matching Is Now (Almost) Standard By way of this post on Search Engine Roundtable, I...
CSS Spam and Robots.txt What really stops anyone from using a CSS-based layout, throwing...