- Jun. 18th, 2006
- 9 comments
I've been digesting this for awhile. Barry Schwartz of Search Engine Roundtable writes that the New York Times is cloaking content in the interest of faciliating Google "to access, crawl, index and rank content that would require a username and password by a normal Web user."
This may sound OK to most, but I fail to see the fairness in this; and it implies that, like the BMW affair, Google is once-again proving that they provide preferential treatment to large companies. BMW.de was reincluded in what, a few days? Good luck, mom and pop, with getting that type of service from Google. Unless you confuse form letters with love letters, you won't be pleasantly surprised.
There is no doubt about it. What the New York Times is doing, without special Google accomodations, or at least their complicity, is a black hat technique according to Google for everyone else. Other search engines are less quick to villify cloaking, so long as it is not used to spam. I agree, but Google is in a pickle here.
What they do is detect the search engine bot and serve it the complete content instead of the abstract and a request to open your wallet. Then they set the "noarchive" option in the robots meta tag. This prevents savvy users from resorting to the cache. In this case, Google probably gives them a guarantee that it will not result in them getting kicked out the index, and probably set some sort of flag in their mysterious database to the effect that Googlebot will not lie about its user agent when spidering the site. Thus, I doubt they need to resort to techniques like Fantomaster's IP database to accomplish what they want, since Google already tipped their hats to this violation of their webmaster guidelines.
Other "offenders" that come to mind are Experts Exchange.
This is not meant to imply that I'm against cloaking. I'm pretty ambivalent, and I'll post cloaking-related topics on this blog. But if Google wants to apply its draconian policies on cloaking to me, they must also apply it to everyone else. Matt Cutts himself has called this problematic in his blog right here.
The blog says:
"Googlebot and other search engine bots can only crawl the free portions that non-subscribed users can access. So, make sure that the free section includes meaty content that offers value."
And he says something similar again in a comment here.
So much for Google arguing for net-neutrality. This policy is far from neutral. If Google wants to offer this sort of preferred treatment, it must allow others to apply for it. And I don't mean empty promises responded to with form letters. Sorry Google — this needs explanation.
Update: Interestingly enough, someone sent me a comment about my last post that wondered if a link to an excluded page counted at all. He asked me why I wouldn't 301 the bot to the non-excluded page using IP-delivery. To Google, this would be cloaking, but personally, I feel it should be OK. If it's OK for the New York Times to profit from cloaking, it should be fine for me to use it innocently to cure architecture woes .
I also posted a cloaking library written in PHP here.