We finally have a conclusion on how exactly to interpret a robot.txt file for the edge cases mentioned here. Someone started a WebmasterWorld thread on the subject of contention.
Indeed, according to the specification, the rules for a specific matching user agent entirely override the "User-agent: *" rules. Therefore, any rule under "User-agent: *" that should also be applied to a specific bot must be repeated under the "User-agent:" for that specific bot. In other words, the more specific set of directives takes precedence over the default, and only one set is applied.
Googleguy says in the thread that he " ... believes most/all search engines interpret robots.txt this way ... " This is also consistent with my testing.
However, some recommend placing the "*" rule last just in case, because some bots may not follow this specification, and take the first match -- even if it's a "*." Doing so achieves the intended result regardless.












August 16th, 2006 at 1:37 pm
Hey SEO Egghead, your link to part 2 is broken. Cheers.
August 16th, 2006 at 1:52 pm
Fixed. I had accidentally flagged it private -- so I saw it, but nobody else did. Thanks.