SEO Egghead by Jaimie Sirovich: A blog about SEO, written for nerds, by a nerd.

Choose a Topic:

» Suggest a topic or buzz to cover; if I write about it, you'll get credit with a link in the post!

Thu
16
Aug '07

How To Guide: Prevent Google Proxy Hacking

One day your traffic comes to a grinding halt. What happened? Check the index. Google may have found all your reciprocal links from i-hump-sheep.info and white-castle-coupons.biz. But it's also possible that you have been "proxy hacked." That's the term being tossed around by a few people who have been mum on it for awhile -- Alan Perkins, Danny Sullivan, Bill Atchison, Brad Fallon, and a few other people that are actually exploiting this hole right now (and whom we don't know).

And it's likely the reason Google (http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html
), Yahoo (http://www.ysearchblog.com/archives/000460.html), MSN (http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx), and Ask (http://about.ask.com/en/docs/about/webmasters.shtml#21) all published guidelines to detect whether a bot is, indeed, an authentic bot. I was wondering about that when I heard about it. Now it all comes together.

So what's going on? Dan Thies has a summary up about it, and he'll probably do a much better job explaining it since he's not a programmer at heart like me. I'll end up speaking in pseudocode ... so read Dan's summary over here. Here's a tidbit that explains a lot of it away:

With the introduction of "Big Daddy," Google crawls from many different data centers; they also changed the algorithm substantially at the same time. According to Dan "It appears that the changes include moving some of the duplicate content detection down to the crawlers. [This is problematic. In short:]

1. The original page exists in at least some of the data centers.
2. A copy (proxy) gets indexed in one data center, and that gets sync'd across to the others.
3. A spider visits the original, checks to see if the content is duplicate, and erroneously decides that it is.
4. The original is dropped or penalized.
"
So ... the problem is that if you flood Google with massive amounts of duplicate content, it exposes a vulnerability. Eventually the algorithm makes a mistake, and your content is no longer authoritative.

Oops!

How To Fight Back -- Code implementations

Well that's where I come in. I have 2 implementations in beta (read: they work according to my tests, but I'm going to be testing more) that address the problem based on the methods the search engines cite. Then, essentially, we're using a benign form of cloaking (yes, cloaking!) to make it more difficult for bad bots, proxies, etc. to exploit us.

The code is located here

I'll expand the explanation in that documentation to make it easier to comprehend/install. But if you know PHP, dive right in.

The code and concepts were primarily based off on the book I coauthored, "Search Engine Optimization with PHP." It is my sentiment that most SEOs have to be aware of technology more so than they think -- hence the book authored by me and co-author Cristian Darie. This is just one example.

You can see it in action here:
http://www.seoegghead.com/tools/test-simple-cloak.php

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  E-Mail This Post/Page

23 Responses to “How To Guide: Prevent Google Proxy Hacking”

  1. Steve Goddard Says:

    Its a very interesting article. When you think of the whole proxy thing, it kind of makes sense that it could cause content to be duplicated. Google really needs to get this sorted out ASAP.

    Lets hope that bringing this proxy problem out in the open doesn't cause too much damage.

    Anyway, great site ! Your book is currently in the mail heading my way !

  2. Friday Tea Time - 8/17/07 » TheMadHat Says:

    [...] it’s easy to do so you certainly need to check it out. SEO Egghead has the solution on how to defend against proxy hacking which is heavy into PHP but doesn’t look too hard to implement. It’s sad the SEO [...]

  3. jose peru Says:

    I dont understand this really does someonebody could explain me plz i am not a english speaker thats why
    plz email to dateameperu at yahoo dot com

  4. Google Proxy Issue - Any Third Party Can De-Index you! | Reviewer of Sites Says:

    [...] Implementation Guide is available on Jaime Sirovich’s blog to walk you through some possible preventative [...]

  5. HassleFreeWebSites.com » Blog Archive » Google Backlinks Update in Progress Says:

    [...] Google Proxy HackingThis is the first time I’ve heard of Google Proxy Hacking so I thought I would post some info: What is it? It’s a method of using Google’s “remove duplicate content” feature to get sites removed from Google’s index or penalized. Find out more and steps to prevent it. [...]

  6. Pushing WordPress SEO Boundaries | Andy Beard - Niche Marketing Says:

    [...] is also an SEO worth reading (though he has been taking some R&R due to illness) and wrote the code to fix the proxy [...]

  7. HassleFreeWebSites.com » Blog Archive » Big Google SERP Changes Says:

    [...] Google Proxy HackingThis is the first time I’ve heard of Google Proxy Hacking so I thought I would post some info: What is it? It’s a method of using Google’s “remove duplicate content” feature to get sites removed from Google’s index or penalized. Find out more and steps to prevent it. [...]

  8. HassleFreeWebSites.com » Blog Archive » Big Google SERP Changes Says:

    [...] duplicate content” feature to get sites removed from Google’s index or penalized. Find out more and steps to prevent it. Tags: ecommerce web hosting, web hosting solution, free domain name, linux web hosting, free web [...]

  9. Forrest Says:

    Very interesting article. I'm going to have to do some more research into this. Google says there's no duplicate content penalty, but they also say landing in the supplementals isn't a bad thing. I would hate for that to happen purely because two bots crawl the same page and naturally see the same thing...!

  10. Kirk Says:

    Hi Jamie, Dan Thies pointed me in your direction as I have several website that all run on a Windows server and primarily use HTM, with some pages being asp. However one of my websites is built in ASP entirely.

    Dan mentions in his blog that you may have been working on a fix for those of us dirty enough to use ASP and windows servers. I just wanted to see if this was still the case and if you have any updates on this yet.

    Thanks for your efforts so far. As someone who isn't all that savvy in web development and manages to just get by, this is a real help.

    Kirk

  11. erwin Says:

    Hi All,

    My site url has been hacked! normaly my site shows up when you type "sms4niets"in google search. Now if you type : sms4niets" in google it redirects to a complete other site ( not mine!) im not very good at php but if somebody could help me implement a sollution?? Please i need some help on this subject. Thanks!! ''

    Erwin dus (@) casema.nl

  12. HassleFreeWebSites.com » Blog Archive » New MSN Live Search Webmaster Portal Says:

    [...] Google Proxy HackingThis is the first time I’ve heard of Google Proxy Hacking so I thought I would post some info: What is it? It’s a method of using Google’s “remove duplicate content” feature to get sites removed from Google’s index or penalized. Find out more and steps to prevent it. [...]

  13. Nancy P Redford’s Practical Marketing Tips » Can Proxy Hacking Remove Your Site From ? Says:

    [...] How To Guide: Prevent Google Proxy Hacking [...]

  14. Internet Marketing Campus » Archive » Can Proxy Hacking Remove Your Site From ? Says:

    [...] How To Guide: Prevent Google Proxy Hacking [...]

  15. Internet Marketing Campus » Archive » Can Proxy Hacking Remove Your Site From Search Engine Results? Says:

    [...] How To Guide: Prevent Google Proxy Hacking [...]

  16. Chris Says:

    I'm with Kirk,
    we have a mix of ASP and HTML and have been fighting this and scrapers for the last couple of years. We really need help. Please keep me posted on your IIS solution.

  17. Proxy Hi.Jack Says:

    I wrote a small article and some scripts that might come in handy when fighting this including a small explanation.

    Dedicated to the PHP coders!

  18. HassleFreeWebSites.com » Blog Archive » Top Paying AdSense Keywords Says:

    [...] duplicate content” feature to get sites removed from Google’s index or penalized. Find out more and steps to prevent it. Tags: ecommerce, web hosting provider, seo tool, web hosting company, ecommerce shopping cart, seo [...]

  19. HassleFreeWebSites.com » Blog Archive » Google Updates Says:

    [...] Google Proxy HackingThis is the first time I’ve heard of Google Proxy Hacking so I thought I would post some info: What is it? It’s a method of using Google’s “remove duplicate content” feature to get sites removed from Google’s index or penalized. Find out more and steps to prevent it. [...]

  20. HassleFreeWebSites.com » Blog Archive » Reports of Google Dropping Indexed Pages Says:

    [...] Google Proxy HackingThis is the first time I’ve heard of Google Proxy Hacking so I thought I would post some info: What is it? It’s a method of using Google’s “remove duplicate content” feature to get sites removed from Google’s index or penalized. Find out more and steps to prevent it. [...]

  21. » 任何人都可以将你的网站从搜索引擎结果中删除 SERPS.CN: 分享SEO经验、资源、技巧 Says:

    [...] to get the code: An implementation guide is provided on Jaimie’s blog, along with a testing environment that you can use to check [...]

  22. HassleFreeWebSites.com » Blog Archive » WiseNut is gone for good Says:

    [...] Google Proxy HackingThis is the first time I’ve heard of Google Proxy Hacking so I thought I would post some info: What is it? It’s a method of using Google’s “remove duplicate content” feature to get sites removed from Google’s index or penalized. Find out more and steps to prevent it. [...]

  23. Phoenix Says:

    I've read Dan's post too and wondered how he implemented his proxy-hack-solution. Thanks Jaimie for sharing yours :)

Leave a Reply