Jul 21

Google Robots.txt Snafu: Part II

Posted by Jaimie Sirovich on Jul. 21st, 2006. 1 comments — voice your opinion.

BECOME AN EGGHEAD. SUBSCRIBE TO OUR RSS FEED OR FOLLOW US ON TWITTER!

Learn to be as nerdy as we are by never missing our latest blog entries. Receive great tips, tricks, and ideas on improving your web site every day! Subscribe via our RSS Feed, follow us, or use the chicklets in the sidebar for more options.
X

I decided that I would test what I think is an inconsistency in the interpretation of the robots.txt specification by various implementors cited here.

I created a robots.txt file for this site that is contrived to test how various spiders interpret the specification.  Here it is:

User-agent: *
Disallow: /blog/seo/automatically-highlighting-internal-links-p51.html
Disallow: /blog/seo/msn-search-p5.html
User-agent: googlebot
Disallow: /blog/seo/msn-search-p5.html
User-agent: msnbot
Disallow: /blog/seo/using-referers-http_referer-to-increase-conversions-and-perceived-relevance-p9.html
User-agent: slurp
Disallow: /blog/seo/yahoo-hostings-lack-of-htaccess-support-p8.html

Since I'd like to make sure all these pages are indexed well in the first place (and because I'm a link-whore), please link to this post for me :)

We already know that Google will only exclude the page on MSN search, but I'm curious how Yahoo and MSN will interpret the file.  In theory, according,  at least, to Google's interpretation, the "*" section should be completely ignored by all — Google, Yahoo, and MSN; and effectively (unless you care about the other search engines), completely without effect here.

We'll know soon.

Tell an amigo:
  • Sphinn
  • Digg
  • Reddit
  • del.icio.us
  • StumbleUpon
  • Facebook



Related posts:
Google Robots.txt Snafu (Update) Some people may know about this already, but it's worth...
Google Robots.txt Snafu: Part III (Conclusion) We finally have a conclusion on how exactly to interpret...
Google's Borked Robots.txt I've never assumed that the "Allow:" directive was supported by...
CSS Spam and Robots.txt What really stops anyone from using a CSS-based layout, throwing...
Wildcard Robots.txt Matching Is Now (Almost) Standard By way of this post on Search Engine Roundtable, I...




"Only One Wise Comment Banged Out Somewhere On The Internet ..."


SEO Egghead » Blog Archive » Google Robots.txt Snafu: Part III (Conclusion)

[...] We finally have a conclusion on how exactly to interpret a robot.txt file for the edge cases mentioned here.  Someone started a WebmasterWorld thread on the subject of contention. [...]



Care To Bang On The Keys ... ?

BECOME AN EGGHEAD. SUBSCRIBE TO OUR RSS FEED!

Learn to be as nerdy as we are by never missing our latest blog entries. Receive great tips, tricks, and ideas on improving your web site every day! Subscribe via our RSS Feed or use the chicklets in the sidebar.