|
Jul
21
1
|
Google Robots.txt Snafu: Part II |
|||||
I decided that I would test what I think is an inconsistency in the interpretation of the robots.txt specification by various implementors cited here. I created a robots.txt file for this site that is contrived to test how various spiders interpret the specification. Here it is: User-agent: * Since I'd like to make sure all these pages are indexed well in the first place (and because I'm a link-whore), please link to this post for me We already know that Google will only exclude the page on MSN search, but I'm curious how Yahoo and MSN will interpret the file. In theory, according, at least, to Google's interpretation, the "*" section should be completely ignored by all — Google, Yahoo, and MSN; and effectively (unless you care about the other search engines), completely without effect here. We'll know soon. Related posts: Google Robots.txt Snafu (Update) Some people may know about this already, but it's worth... Google Robots.txt Snafu: Part III (Conclusion) We finally have a conclusion on how exactly to interpret... Google's Borked Robots.txt I've never assumed that the "Allow:" directive was supported by... CSS Spam and Robots.txt What really stops anyone from using a CSS-based layout, throwing... Wildcard Robots.txt Matching Is Now (Almost) Standard By way of this post on Search Engine Roundtable, I...
| ||||||
"Only One Wise Comment Banged Out Somewhere On The Internet ..."SEO Egghead » Blog Archive » Google Robots.txt Snafu: Part III (Conclusion)[...] We finally have a conclusion on how exactly to interpret a robot.txt file for the edge cases mentioned here. Someone started a WebmasterWorld thread on the subject of contention. [...]
|


















