|
Jun
29
2
|
CSS Spam and Robots.txt |
|||||
What really stops anyone from using a CSS-based layout, throwing up some spammy content, then using CSS with all sorts of positioning to hide it? Spiders can read CSS now, and you'll get busted, right? Perhaps. But what if one were to place the CSS in a file and exclude that file via robots.txt? To make it look even less like subtrefuge, simply place the CSS file in your images folder or something. Assuming Google actually honors the robots exclusion protocol, wouldn't that permit one to spam with immunity? Again, I'm not a black hat, but isn't that a logical conclusion? I don't see it as a spam signature, either. Obviously, don't exclude the CSS file explicitly; rather, exclude the directory it resides in. Also, do not use the same directory name every time. It just "happens" that you put it there. Some examples would be /includes, /resources, /images, /pics, /tumadre, etc. Someone could rat you out, but that would take human time — it couldn't be automatic. I just think it's rather odd that you can categorically remove the possibility of getting nailed by an automated CSS spam checker simply by not permitting the spider to access the CSS itself. In fact, if I were a search engine, I'd assert that it's part of the page and therefore not a valid exclusion within the protocol. But they (Google, Yahoo, MSN) do appear to honor robots.txt for CSS as well. Theoretically it could be a red flag, but I'm sure many sites exclude their CSS just because it happens to reside in an excluded directory. So it's not much of a red flag, either. Is it really that simple, though? People get banned for CSS spam every day by automated checks; that seems pretty pathetic if it's this easy to avoid it. For the most part, manual bans and penalties are rare, and that's a wide-open door for spamming. Related posts: Google Robots.txt Snafu: Part II I decided that I would test what I think is... Wildcard Robots.txt Matching Is Now (Almost) Standard By way of this post on Search Engine Roundtable, I... Google's Borked Robots.txt I've never assumed that the "Allow:" directive was supported by... Google Robots.txt Snafu (Update) Some people may know about this already, but it's worth... Google Robots.txt Snafu: Part III (Conclusion) We finally have a conclusion on how exactly to interpret...
| ||||||
"2 Wise Comments Banged Out Somewhere On The Internet ..."
Nice / nasty thought. Couldn't you nail even the ha.ckers.org security lab - Archive » SEO Spamming Using robots.txt[...] This week Jaime Sirovich, QuadsZilla and I had an interesting thread where Jaime proposed the idea of putting CSS inside a robots.txt protected directory. The CSS, of course, would hide spam on a page from the eyes of anyone who just visited the page thus allowing you to SEO (Search Engine Optimization) spam the spiders visiting your site. Any robots that respected this would then spider with the spam intact. I believe you would have a better chance with /includes than /images if you are just trying to social engineer people from not being curious. Also, by absolute linking, verses relative linking, it wouldn't matter if one of the engines cached it or someone used a language translation services, as all of the ones I've tested preserve hard links and call them directly or pass them through a proxy unchanged (in the case of the anonymizer). [...]
|


















