- Jul. 11th, 2006
- 5 comments
People have too much faith in Google – even when doing so implies a violation of the principles of computer science. Many Google-oglers have contended that Google can find applications of JavaScript redirect cloaking with ease. I'm not a PhD in Computer Science, but I doubt there is any easy way to find this stuff generally. Yes, even for Google. Clever spammers will be doing this sort of thing for awhile.
Google will nail the lousy spammers that cut and pre-made paste scripts with a common signature, just as the Feds catch the script kiddies running precompiled exploit scripts on Windows. Google will catch the idiots. But it won’t even happen en masse, or to everyone. Here’s why:
First, there is provably an infinite number of ways to do it. I can even do this sort of proof using induction. So there's no signature. This makes it much harder to detect, even if it's on the page. Any number of trivial or non-trivial changes can be made to a program to make it different. I’m not putting the proof here, because it’s boring, and it is common sense anyway. If Google tried to do it by signature, they would also get a lot of false positives and negatives; and both are bad.
And executing the code won't necessarily work, either, as there are several pretty simple devious ways to confuse it. I'm not even a black hat, and I can think of some cute ways off the top of my head. Not to mention the fact that executing code is a massive amount of computing time, even if it uses some nice heuristics to detect which pages it would like audit more closely. And since there's no general method to prove if/when a program terminates, I can probably even do something very mean to Mr. Google and waste lots of its time.
How about using a simple "rot13" function to "encrypt" some redirect code? Then do the reverse and append it to the document. Or how about using AJAX script that invokes some code grabbed from a server using IP cloaking to serve differing code and have the document execute that? Seeing the synthesis of technology to do something creative like this makes me glad I am alive.
Google is better off spending their time devaluing these spam sites based on their poor content or spam backlinks. I believe that's what's going on anyway. Analyzing copy for quality is a better and more attainable target metric in my opinion, but I'm not an expert here. Feel free to comment.
Here’s an example of redirection code employing rot13 (or is it ebg13?) that Google would actually have to run in order to detect:
<script language="JavaScript">
<!–
function ebg13(string) {
var aCode = 'a'.charCodeAt();
var zCode = 'z'.charCodeAt();
var ACode = 'A'.charCodeAt();
var ZCode = 'Z'.charCodeAt();
var result = '';
for (var c = 0; c < string.length; c++) {
var charCode = string.charCodeAt(c);
if (charCode >= aCode && charCode <= zCode)
charCode = aCode + (charCode - aCode + 13) % 26;
else if (charCode >= ACode && charCode <= ZCode)
charCode = ACode + (charCode - ACode + 13) % 26;
result += String.fromCharCode(charCode);
}
return result;
}
document.write(ebg13("<fpevcg>ybpngvba.uers='uggc://jjj.paa.pbz';</fpevcg>"));
// –>
</script>
Of course this is not nearly the end of the road as far as obfuscation. I'll leave the AJAX example suggested above up to you. If anyone is going to nail spam, it's Google, but even the Google PhDs have their limitations.
Update: Black_Knight, a moderator on Cre8asiteforums, pointed out that behavioral data, such as the data collected by Google from the Google Toolbar, can be used to detect things like this in his post here. It's an interesting approach, because it can be used to leverage the (free) computational resources of all of their toolbar users to detect JavaScript redirection. This gives a high degree of statistical confidence in the aggregate. I must admit, it's a powerful idea; I'm just not sure Google actually does this.
Related posts:
"5 Wise Comments Banged Out Somewhere On The Internet ..."
I fully agree I think its funny that people thing google or any search engine can stop spammers. And if they start working trying to detect bad content spammers will just start using better content. anyways well said.
I removed the spammer's stuff FYI. But anyone who registers domains with GoDaddy, and then violates copyright law obviously needs his head checked. My God. I let the first few go, but it's not funny anymore.
You make some very interesting points. I particularly agree with your update - Google's toolbar users are contributing a huge amount of behavioural data which will, no doubt, be used more and more in the future. ha.ckers.org web application security lab - Archive » Sometimes it Sucks Being a Search Engine Spammer[...] Somehow I ended up on the dumb side of a search engine spammer. I have no idea why anyone would think this would be a good site to rip off - you have to be a serious newbie to think that's a good idea. Anyway, there I was, getting pingbacks and referring URLs and people telling me that my site was being ripped off by some dumbass. The only vaguely amusing side of this is that other SEO blogs have been hit by this recently too. [...] SEO Egghead » Blog Archive » Microsoft’s Spam Project[...] I said awhile back in my post, Google Violates Computer Science that people have too much faith in Google. I said that even light obfuscation of Javascript redirect code, such as rot13ing the code would likely even trick the formidable Google. I may have changed my mind. This stuff may work near-term, but I have my doubts as to the future. [...]
|
















