- Nov. 30th, 2007
- 4 comments
Authored By: Cristian Darie
I'm writing this article to warn you about an implementation detail of the ASP.NET 2.0 session management mechanism, which, if handled incorrectly, can potentially remove your ASP.NET 2.0 web site from Google's index.
The communication protocol our web browsers use when navigating the Internet - HTTP - was designed to be stateless protocol. Unless special tracking-techniques are used, a web server that serves many requests at the same time cannot know if those requests come from different users, or from a single user performing multiple requests. Using an IP address to is particularly useless when NAT-based LANs make a web server request.
State management: Sessions
Needless to say, there's not that much a website can do for you, if that website doesn't know who you are. Consequently, several state-management mechanisms have been developed over HTTP to allow web developers implement the requested features in their websites.
The two significant mechanisms for handling user sessions are:
1. URL-based sessions. In this case, when a user visits a website for the first time, he or she is redirected to the URL of that web site, with a session ID appended to the query string, such as http://www.example.com?SESSION_ID=123123. Each subsequent request to that web site will contain that ID, so the web server will know who the request came from.
2. Cookie-based sessions. In this case, the first time a user visits a web site, that web site will save a cookie on the user's browser. On each request, the web site checks for the presence of that cookie, and depending on its value is able to determine who the visitor is.
URL-based sessions have proven to be quite problematic. Search engines sometimes have trouble spidering web sites that use them and they can pose security problems (a hacker obtaining "your" URL could potentially hijack your identity on the website). On the other hand, cookie-based sessions don't work for users whose web browsers don't support cookies, or have disabled the support for cookies.
Modern web development technologies, such as ASP.NET and PHP, have built-in support for both URL-based sessions and cookie-based sessions. In both cases, by default the session is handled using cookies.
To activate URL-based sessions in ASP.NET you need to set
Details about this mechanism are mentioned in our book, Professional Search Engine Optimization with ASP.NET, in Chapter 5: Duplicate Content. In the same chapter you're explained the reasons for which you don't want to use URL-based sessions, unless you really need them. They generate numerous pages with different URLs, but holding the same content. The numerous implications to this are detailed in the book, but in short, having such pages complicate the spidering process of your web site, and may lead to direct or indirect penalties, and implicitly lower performance with the search engines.
By default, ASP.NET requires cookies for session state management and for user logins. As you can imagine, a typical ASP.NET web site will not work well (or at all!) if the user's browser doesn't support cookies, or if the cookie support is disabled.
Losing cookieless visitors isn't a significant problem for most web sites since almost all web browsers do support (and are configured to support) cookies. However, for certain businesses losing those customers is not an option.
To overcome this problem, ASP.NET 2.0 introduced a new session handling option named "AutoDetect." This feature is very well explained here: http://msdn2.microsoft.com/en-us/library/aa479315.aspx.
By default, the cookieless attribute has the value "UseCookies", so by default ASP.NET web applications will never automatically generate url-based session IDs, altering your URL. If, however, you change the cookieless value to "AutoDetect" and try loading that website with a cookieless browser, you're automatically redirected to an URL such as:
The side effect with ASP.NET 2.0's cookie support autodetection is that it works for web spiders as well (not that I think it was designed to do so, though). More specifically, Google's spider uses a user agent definition that is interpreted by ASP.NET 2.0 to be that of an old browser that doesn't support cookies:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
If you switch your session's cookieless setting to AutoDetect, Google will be served with URLs that contain automatically generated session IDs. (More specificatlly, Google will be served with 302 redirects to URLs that contain session IDs.)
Technically, ASP.NET's behavior is correct. If you configure it to use URL-based sessions for cookieless users, it happily does so. And Google's web spider does not support cookies, indeed. In practice however, you don't want to feed Google with such URLs, because this is likely to hurt your performance with the search engines.
1. The easiest solution to the problem is to stop using ASP.NET's AutoDetect session mode.
2. If you need to use that feature though, you simply can configure ASP.NET to recognize Google's spider as supporting cookies. This article shows how. (http://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx)
3. You can implement automatic support for URL-based sessions yourself. This takes some time to implement, and the benefits may not be worth the implementation cost. It works like this:
- you use cloaking to generate session IDs if the visitor is not a web spider
- start generating session IDs only when the session is really needed for tracking (for example, after the visitor adds items to his or her shopping cart). This way you don't feed your users with URL-based session IDs unless you really need to.
"4 Wise Comments Banged Out Somewhere On The Internet ..."