- Jun. 13th, 2006
- 1 comments
To be honest, I'm not even sure this matters much anymore, but I thought I'd mention it. Like the issue with parameter ordering (?a=1&b=2 vs. ?b=2&a=1) I mentioned here, a slash at the end of a URL can pose a similar ambiguity problem. Fortunately, at least for non-rewritten pages, Apache takes care of this issue. If the resource is a directory, it gets 301-redirected to the-url/, and vice versa.
But when mod_rewrite is used, we're not dealing with a file-system, and nothing is done automatically. Ideally, we'd perform the same thing manually in our scripts and 301-redirect a URL "missing" a trailing slash to one with a slash (or vice versa). It's a minor issue, but it's automatically dealt with if you use URL functions to generate your URL strings (which have a slash on them) and redirect if the current URL doesn't match the generated URL. That's what I do. Either way, I make sure either URL works — with or without the trailing slash.
GoogleGuy, on Search Engine Watch Forums states " … [these URLs] are almost always are the same, but technically according to the HTTP standards I don't think that they have to be the same." I agree. They don't have to be the same. Ideally, just don't give the spider anything to worry about, and take care of it. He later states that "even if we get duplicate content for two nearly identical urls, we have heuristics that normally detect that sort of thing. That's why the search collapses those two urls together unless you do "&filter=0"." So it's really a non-issue with regard to duplicate content. I admit it.
There is another more important issue I will mention. Some search engines normalize the URLs and chop off the trailing slash, regardless of the fact that your webserver provides the URL with the trailing slash. Yahoo does this — at least sometimes. Therefore, the minimum one must be concerned about is that your rewrite rules allow for either URL, not just one or the other. Ideally one would also redirect to the URL with the trailing slash like I said, but it's probably not a major issue. And because of this normalization issue, expect people to link to your URLs both ways, even if you're consistent on the site itself, because the search engines themselves (or at least Yahoo) will link incorrectly.
RewriteRule ^.*blah/?$ /pages.php
Adding the trailing slash via mod_rewrite is also possible, but if you take my advice and use functions to create your URLs, it's much easier and more logical to take care of it at the application level.
If you take a look at http://search.yahoo.com/search?p=site%3Awww.lawyerseek.com&sm=Yahoo%21+Search&fr=FP-tab-web-t&toggle=1&cop=&ei=UTF-8, you'll notice something even stranger (assuming it hasn't changed since this posting). Yahoo seems to index the URLs with and without the trailing slash seemingly randomly. Notably, I can assert that some of those URLs are only linked internally, and I have never linked them without a slash, so it's only normalizing sometimes (wouldn't that negate the purpose of normalization then?). Either way, the URLs get resolved to the proper content.
So regardless, both of these URLs will work, as the mod_rewrite regex matches either, and never deliver (mostly trivial) duplicate content, as the former will get 301-redirected to the latter.
I appologize for boring you :)
"Only One Wise Comment Banged Out Somewhere On The Internet ..."