Jun 29

CSS Spam and Robots.txt

Posted by Jaimie Sirovich on Jun. 29th, 2006. 2 comments — voice your opinion.

NEED A GREAT WEB SITE? NEED IT TO BE SEARCH-ENGINE-FRIENDLY?

SEO Egghead is a web development firm dedicated to creating custom, search engine optimized web site applications. We specialize in eCommerce and content management web sites that not only render information beautifully to the human, but also satisfy the "third browser" — the search engine. To us, search engines are people too. Click here to talk to us. We'd love to help!
X

What really stops anyone from using a CSS-based layout, throwing up some spammy content, then using CSS with all sorts of positioning to hide it? Spiders can read CSS now, and you'll get busted, right? Perhaps.

But what if one were to place the CSS in a file and exclude that file via robots.txt? To make it look even less like subtrefuge, simply place the CSS file in your images folder or something. Assuming Google actually honors the robots exclusion protocol, wouldn't that permit one to spam with immunity?

Again, I'm not a black hat, but isn't that a logical conclusion?

I don't see it as a spam signature, either. Obviously, don't exclude the CSS file explicitly; rather, exclude the directory it resides in. Also, do not use the same directory name every time. It just "happens" that you put it there. Some examples would be /includes, /resources, /images, /pics, /tumadre, etc. Someone could rat you out, but that would take human time — it couldn't be automatic.

I just think it's rather odd that you can categorically remove the possibility of getting nailed by an automated CSS spam checker simply by not permitting the spider to access the CSS itself. In fact, if I were a search engine, I'd assert that it's part of the page and therefore not a valid exclusion within the protocol. But they (Google, Yahoo, MSN) do appear to honor robots.txt for CSS as well. Theoretically it could be a red flag, but I'm sure many sites exclude their CSS just because it happens to reside in an excluded directory. So it's not much of a red flag, either.

Is it really that simple, though? People get banned for CSS spam every day by automated checks; that seems pretty pathetic if it's this easy to avoid it. For the most part, manual bans and penalties are rare, and that's a wide-open door for spamming.

Tell an amigo:
  • Sphinn
  • Digg
  • Reddit
  • del.icio.us
  • StumbleUpon
  • Facebook



Related posts:
Google Robots.txt Snafu: Part II I decided that I would test what I think is...
Wildcard Robots.txt Matching Is Now (Almost) Standard By way of this post on Search Engine Roundtable, I...
Google's Borked Robots.txt I've never assumed that the "Allow:" directive was supported by...
Google Robots.txt Snafu (Update) Some people may know about this already, but it's worth...
Google Robots.txt Snafu: Part III (Conclusion) We finally have a conclusion on how exactly to interpret...




"2 Wise Comments Banged Out Somewhere On The Internet ..."


alex

Nice / nasty thought. Couldn't you nail even the
~possibility~ that a spider would read your excluded CSS by IP delivering it? So everyone bar the spiders gets the spammy CSS and the spiders get a "vanilla" file to munch on?

ha.ckers.org security lab - Archive » SEO Spamming Using robots.txt

[...] This week Jaime Sirovich, QuadsZilla and I had an interesting thread where Jaime proposed the idea of putting CSS inside a robots.txt protected directory. The CSS, of course, would hide spam on a page from the eyes of anyone who just visited the page thus allowing you to SEO (Search Engine Optimization) spam the spiders visiting your site. Any robots that respected this would then spider with the spam intact. I believe you would have a better chance with /includes than /images if you are just trying to social engineer people from not being curious. Also, by absolute linking, verses relative linking, it wouldn't matter if one of the engines cached it or someone used a language translation services, as all of the ones I've tested preserve hard links and call them directly or pass them through a proxy unchanged (in the case of the anonymizer). [...]



Care To Bang On The Keys ... ?

BECOME AN EGGHEAD. SUBSCRIBE TO OUR RSS FEED!

Learn to be as nerdy as we are by never missing our latest blog entries. Receive great tips, tricks, and ideas on improving your web site every day! Subscribe via our RSS Feed or use the chicklets in the sidebar.