Aug 16

Google Robots.txt Snafu: Part III (Conclusion)

Posted by Jaimie Sirovich on Aug. 16th, 2006. 2 comments — voice your opinion.

NEED A GREAT WEB SITE? NEED IT TO BE SEARCH-ENGINE-FRIENDLY?

SEO Egghead is a web development firm dedicated to creating custom, search engine optimized web site applications. We specialize in eCommerce and content management web sites that not only render information beautifully to the human, but also satisfy the "third browser" — the search engine. To us, search engines are people too. Click here to talk to us. We'd love to help!
X

We finally have a conclusion on how exactly to interpret a robot.txt file for the edge cases mentioned here.  Someone started a WebmasterWorld thread on the subject of contention.

Indeed, according to the specification, the rules for a specific matching user agent entirely override the "User-agent: *" rules.  Therefore, any rule under "User-agent: *" that should also be applied to a specific bot must be repeated under the "User-agent:" for that specific bot.  In other words, the more specific set of directives takes precedence over the default, and only one set is applied.

Googleguy says in the thread that he " … believes most/all search engines interpret robots.txt this way … "  This is also consistent with my testing.

However, some recommend placing the "*" rule last just in case, because some bots may not follow this specification, and take the first match — even if it's a "*."  Doing so achieves the intended result regardless.

Tell an amigo:
  • Sphinn
  • Digg
  • Reddit
  • del.icio.us
  • StumbleUpon
  • Facebook



Related posts:
Google Robots.txt Snafu (Update) Some people may know about this already, but it's worth...
Google Robots.txt Snafu: Part II I decided that I would test what I think is...
Wildcard Robots.txt Matching Is Now (Almost) Standard By way of this post on Search Engine Roundtable, I...
Google's Borked Robots.txt I've never assumed that the "Allow:" directive was supported by...
CSS Spam and Robots.txt What really stops anyone from using a CSS-based layout, throwing...




"2 Wise Comments Banged Out Somewhere On The Internet ..."


wangarific

Hey SEO Egghead, your link to part 2 is broken. Cheers.

Jaimie Sirovich

Fixed. I had accidentally flagged it private — so I saw it, but nobody else did. Thanks.



Care To Bang On The Keys ... ?

BECOME AN EGGHEAD. SUBSCRIBE TO OUR RSS FEED!

Learn to be as nerdy as we are by never missing our latest blog entries. Receive great tips, tricks, and ideas on improving your web site every day! Subscribe via our RSS Feed or use the chicklets in the sidebar.