- Jul. 21st, 2006
- 1 comments
I decided that I would test what I think is an inconsistency in the interpretation of the robots.txt specification by various implementors cited here.
I created a robots.txt file for this site that is contrived to test how various spiders interpret the specification. Here it is:
User-agent: *
Disallow: /blog/seo/automatically-highlighting-internal-links-p51.html
Disallow: /blog/seo/msn-search-p5.html
User-agent: googlebot
Disallow: /blog/seo/msn-search-p5.html
User-agent: msnbot
Disallow: /blog/seo/using-referers-http_referer-to-increase-conversions-and-perceived-relevance-p9.html
User-agent: slurp
Disallow: /blog/seo/yahoo-hostings-lack-of-htaccess-support-p8.html
Since I'd like to make sure all these pages are indexed well in the first place (and because I'm a link-whore), please link to this post for me
We already know that Google will only exclude the page on MSN search, but I'm curious how Yahoo and MSN will interpret the file. In theory, according, at least, to Google's interpretation, the "*" section should be completely ignored by all — Google, Yahoo, and MSN; and effectively (unless you care about the other search engines), completely without effect here.
We'll know soon.
Related posts:
"Only One Wise Comment Banged Out Somewhere On The Internet ..."SEO Egghead » Blog Archive » Google Robots.txt Snafu: Part III (Conclusion)[...] We finally have a conclusion on how exactly to interpret a robot.txt file for the edge cases mentioned here. Someone started a WebmasterWorld thread on the subject of contention. [...]
|
















