SEO Egghead by Jaimie Sirovich: A blog about SEO, written for nerds, by a nerd.

Choose a Topic:

» Suggest a topic or buzz to cover; if I write about it, you'll get credit with a link in the post!

Tue
18
Mar '08

SEO Egghead, Inc. - Now Taking Web Development Projects

We're currently looking for clients in the following realms:

Custom Application Development
1. Custom Content Management System Development
2. Custom eCommerce Web Site Development
3. Intranet Development

Our applications are designed to give users full control over content and require no knowledge of HTML or advanced web design skills. All applications are designed to be SE-friendly at an architectural basis. We exercise the principles that we've written about in our book:

seo book

Application Enhancement
4. Modification of PHP and ASP.NET web applications to fix architectural issues and increase search engine visibility. Few firms will take on this challenge as they do not have the resources and skills to get the job done. We do. Services include: Removing duplicate content problems, implementing sIFR, providing more control over titles and content, and URL modification including rewrites.

Note: Currently we are not accepting new SEO clients, but that will change in the next few months.

The SEO Egghead, Inc. Difference
Our applications are developed using our custom application framework and designed from the ground up to be SE-friendly and easy-to-manage. Our small highly-trained staff is educated and enthusiastic about search marketing, especially as it pertains to web site architecture. If you wish to have a sound basis for your company's future, speak to us.

Please contact us at jaimie@seoegghead.com or eli@seoegghead.com for details and a demo of things we've developed.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  E-Mail This Post/Page
Leave a passing comment »
Wed
23
Jan '08

MySpace Negligent: Pedophile IQ > MySpace IQ

The average IQ of the internet pedophile is apparently much higher than the aggregate of IQs at MySpace.

Let me make a prediction: MySpace will be found liable for several incidents of child exploitation going forward -- even more so than before. Wired's blog made it public over here.

I was blissfully unaware of this until now (not being a pedophile and all...), but I could always tell that MySpace was a poorly conceived application. The idea was good, but that is where it ended.

The problem is that MySpace was started (and hence programmed) by a team that never knew how big it would become -- and quite possibly in over its head. And of course every programmer or IT manager knows that it's orders of magnitude harder to fix applications while they're being used. Often, it must be refactored and/or rewritten. That's even worse.

That said, it should have been done. There is enough money flowing into the enterprise that some of it should be devoted to fixing all obvious flaws, especially where it pertains to child exploitation.

The frightening part of the most recent bug is that, assuming I understand it, it was trivial to fix. Criticisms about the inanity of letting users have full control over content (and hence allowing for creative XSS attacks and phishing), this wasn't anything complex. Let me explain:

Typically a web site displaying anything -- whether it be products, files, or, in this case, provocative photos of 14 year olds, has two levels of navigation.

1. List multiple items in a catalog.
2. View a particular item.

A simplified request to a web server for level "1." looks like this:

http://www.socialsite.com/album/?user=bob (view all photos in album for "bob").

And for 2:

http://www.pictures.com/album/?picture_number=12345 (view photo 12345).

When myspace decided to protect your 14 year old daughter from the "Army of Pedophiles," they initially only prevented level "1." from being viewed. So if a user contrived a method to guess or derive the latter type of URL, nothing stopped that from occuring. And that's just what this "army" figured out how to do.

This incensed me -- not because I've ever been this sloppy in implementing an application, but because MySpace was under scrutiny, was aware that they have facilitated pedophilia in the past, and didn't audit their application to verify that this sort of obvious sloppy security hole didn't exist until it was obviously exploited.

I would assert that the fix for this security hole was no more than 2 lines of program code and would take about 1 week to exhaustively test and deploy.

Meanwhile the self-proclaimed "pedo-army" figured it out, masturbated to pictures of your daughter, and will probably continue to see more of her when the pedophiles outfox MySpace again ...

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  E-Mail This Post/Page
1 Comment »
Fri
30
Nov '07

ASP.NET 2.0 Setting Dangerous for Google Indexing

Authored By: Cristian Darie
http://www.cristiandarie.ro/books.

I’m writing this article to warn you about an implementation detail of the ASP.NET 2.0 session management mechanism, which, if handled incorrectly, can potentially remove your ASP.NET 2.0 web site from Google's index.

The Background

The communication protocol our web browsers use when navigating the Internet - HTTP - was designed to be stateless protocol. Unless special tracking-techniques are used, a web server that serves many requests at the same time cannot know if those requests come from different users, or from a single user performing multiple requests. Using an IP address to is particularly useless when NAT-based LANs make a web server request.

State management: Sessions

Needless to say, there’s not that much a website can do for you, if that website doesn’t know who you are. Consequently, several state-management mechanisms have been developed over HTTP to allow web developers implement the requested features in their websites.

The two significant mechanisms for handling user sessions are:

1. URL-based sessions. In this case, when a user visits a website for the first time, he or she is redirected to the URL of that web site, with a session ID appended to the query string, such as http://www.example.com?SESSION_ID=123123. Each subsequent request to that web site will contain that ID, so the web server will know who the request came from.

2. Cookie-based sessions. In this case, the first time a user visits a web site, that web site will save a cookie on the user's browser. On each request, the web site checks for the presence of that cookie, and depending on its value is able to determine who the visitor is.

URL-based sessions have proven to be quite problematic. Search engines sometimes have trouble spidering web sites that use them and they can pose security problems (a hacker obtaining "your" URL could potentially hijack your identity on the website). On the other hand, cookie-based sessions don’t work for users whose web browsers don’t support cookies, or have disabled the support for cookies.

Modern web development technologies, such as ASP.NET and PHP, have built-in support for both URL-based sessions and cookie-based sessions. In both cases, by default the session is handled using cookies.

To activate URL-based sessions in ASP.NET you need to set in Web.config (see http://msdn2.microsoft.com/en-us/library/h6bb9cz9.aspx). In that case, your ASP.NET application will automatically load URLs such as:
http://www.example.com/(S(c3hvob55wirrnwzbeicoo355))

Details about this mechanism are mentioned in our book, Professional Search Engine Optimization with ASP.NET, in Chapter 5: Duplicate Content. In the same chapter you're explained the reasons for which you don't want to use URL-based sessions, unless you really need them. They generate numerous pages with different URLs, but holding the same content. The numerous implications to this are detailed in the book, but in short, having such pages complicate the spidering process of your web site, and may lead to direct or indirect penalties, and implicitly lower performance with the search engines.

More Background

By default, ASP.NET requires cookies for session state management and for user logins. As you can imagine, a typical ASP.NET web site will not work well (or at all!) if the user’s browser doesn't support cookies, or if the cookie support is disabled.

Losing cookieless visitors isn’t a significant problem for most web sites since almost all web browsers do support (and are configured to support) cookies. However, for certain businesses losing those customers is not an option.

To overcome this problem, ASP.NET 2.0 introduced a new session handling option named "AutoDetect." This feature is very well explained here: http://msdn2.microsoft.com/en-us/library/aa479315.aspx.

By default, the cookieless attribute has the value "UseCookies", so by default ASP.NET web applications will never automatically generate url-based session IDs, altering your URL. If, however, you change the cookieless value to "AutoDetect" and try loading that website with a cookieless browser, you’re automatically redirected to an URL such as:
http://www.example.com/(S(esy20aeudrpvr1555nhmvb45))/

The Problem

The side effect with ASP.NET 2.0’s cookie support autodetection is that it works for web spiders as well (not that I think it was designed to do so, though). More specifically, Google’s spider uses a user agent definition that is interpreted by ASP.NET 2.0 to be that of an old browser that doesn’t support cookies:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

If you switch your session’s cookieless setting to AutoDetect, Google will be served with URLs that contain automatically generated session IDs. (More specificatlly, Google will be served with 302 redirects to URLs that contain session IDs.)

Technically, ASP.NET’s behavior is correct. If you configure it to use URL-based sessions for cookieless users, it happily does so. And Google’s web spider does not support cookies, indeed. In practice however, you don’t want to feed Google with such URLs, because this is likely to hurt your performance with the search engines.

Possible solutions

1. The easiest solution to the problem is to stop using ASP.NET’s AutoDetect session mode.
2. If you need to use that feature though, you simply can configure ASP.NET to recognize Google’s spider as supporting cookies. This article shows how. (http://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx)
3. You can implement automatic support for URL-based sessions yourself. This takes some time to implement, and the benefits may not be worth the implementation cost. It works like this:
- you use cloaking to generate session IDs if the visitor is not a web spider
- start generating session IDs only when the session is really needed for tracking (for example, after the visitor adds items to his or her shopping cart). This way you don’t feed your users with URL-based session IDs unless you really need to.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  E-Mail This Post/Page
3 Comments »
Wed
31
Oct '07

WP HTML Taint Check

<?

/*
Plugin Name: HTMLTaintCheck
Plugin URI: http://www.seoegghead.com/
Description: Checks for suspicious links in posts. MAKE SURE TO REPLACE YOUR EMAIL ADDRESS IN THE CODE BELOW -- ALSO ONLY LEAVE THIS ON TO CHECK, THEN TURN IT BACK OFF!!!
Author: Jaimie Sirovich
Version: 0.1
Author URI: http://www.seoegghead.com/
*/ 

if (true) {
  
check_posts();
}

function check_posts()
{    

    global $wpdb$table_prefix;

    $items $wpdb->get_results("

        SELECT post_title, ID, post_name, post_content

        FROM {$table_prefix}posts

        WHERE TRUE
    ");    
    
    
$copy '';
    
    foreach (
$items as $i) {
        if (
preg_match('#adshelper|softicana#i'$i->post_content)) {
            
$copy .= $i->ID ' ' "IS SUSPICIOUS.\r\n";
        } else {
            
$copy .= $i->ID ' ' "OK.\r\n";
        }
    }
    
    
mail('YOUREMAIL@ADDRESS.com', 'test', $copy);    
}

?>

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  E-Mail This Post/Page
2 Comments »
Wed
31
Oct '07

Latest WordPress 2.3.1 Vulnerable To Hackers

Update: WP developers are looking into this now . . .

The current version of WordPress (also 2.1-2.3.1 verified so far) is apparently vulnerable to an HTML-tainting attack. I first noticed it on this blog in the next-to-top post. I've actually been on a vacation of sorts, but I monitor changes to my web site carefully. WordPress.org has been notified, but I feel that releasing only the existence of the potential vulnerability is ethical. I have also created a a tool to audit for this attack (see "How Do You Know If You're Affected?" below). Others' equity is at stake here as well!

Though I don't know the exact mechanism yet, I have some ideas based on my logs, and I have a high degree of confidence it's WordPress-specific hack (or perhaps a very popular plug-in) for the following reasons:

0. The links are clearly pathological and deliberately concealed visually using CSS.
1. All exploited sites are running WordPress.
2. The sites are on various shared-hosting and dedicated-hosting in various places around the internet (it's not a particular hosting company).
3. The HTML-tainting appears in the actual database record (at least in my case) for the post; and that's not generally the easiest approach for this sort of attack.

Example HTML Insertion Attack:

<div id="mnu1" style="overflow: auto; position: absolute; z-index: 1; left: -500px; text-indent: -500px; width: 600px; text-align:left">
Where <a href="http://www.adshelper.com/">download mp3 music</a>? It's obvious.<br /> Fast downloads and super high quality... <br />Try to guess, what's the best site to <a href="http://www.my-movie-download.com/">download movies</a>? Yes, that's right.<br /> I recently downloaded few films with super high quality.
</div>


The outbound links appear to point to the following domains:
ADSHELPER.COM (WHOIS RECORD NOT PRIVATE)
SOFTICANA.COM (WHOIS RECORD PRIVATE; GODADDY; TO INFORM GODADDY OF SPAM USE THIS LINK)

How Do You Know If You're Affected?

I'm writing a quick-and-dirty WordPress plugin to scan your blog for the signature of the HTML-tainting. Install it. It will email you with the affected post IDs if you've been hit (Note: IT SHOULD WORK NOW ...).

What Can We Learn From this?

The Domain and URL have an equity, and black hatters are always looking for a way to exploit that. One must be vigilant to protect this equity by monitoring for attacks like this, as it can be particularly harmful to your rankings as well. HTML insertion attacks in general are also documented in my book:



Search Engine Optimization with PHP

I have a high degree of confidence that this individual is involved: this guy. He has been on DigitalPoint forums. More information about him is located here.

Affected WordPress Versions

2.1-2.3.1

Prominent Affected Sites

via http://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2Fwww.adshelper.com&bwm=i&bwms=p&bwmf=u&fr=yfp-t-471&fr2=seo-rd-se

http://warpspire.com/hemingway : WordPress 2.3-alpha
http://www.cartoonbrew.com/ : WordPress 2.1
http://www.powazek.com/ : WordPress 2.2
http://www.freekareem.org/ : WordPress 2.2.1
http://www.smallbiztrends.com/ : WordPress 2.2.1
http://blog.modernmechanix.com/ : WordPress 2.1.2
http://www.bittbox.com/ : WordPress 2.2.1
http://www.ethanzuckerman.com/blog/ : WordPress 2.2.1
http://www.tjcenter.org/ : WordPress 2.1.2
http://www.cato-at-liberty.org/ : Signature Removed
http://www.smstextnews.com/ : WordPress 2.1.3
http://blog.ianbicking.org/ : WordPress 2.2.1
http://www.searchviews.com/ : WordPress 2.2
http://www.dreammanifesto.com/ : WordPress 2.3.1

via http://siteexplorer.search.yahoo.com/advsearch?p=http%3A%2F%2Fsofticana.com&bwm=i&bwmf=a&bwms=p

http://www.zeldman.com/ : Signature Removed
http://www.mysqlperformanceblog.com/ : WordPress 2.3.1
http://blog.oup.com/ : WordPress 2.2
http://weblog.philringnalda.com/ : WordPress 2.3.1
http://blog.everythingflex.com/ : WordPress 2.1.3
http://www.darfur-awareness.org/ : WordPress 2.2

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  E-Mail This Post/Page
16 Comments »
Wed
31
Oct '07

Protected: WP Spammer Information

This post is password protected. To view it please enter your password below:


These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  E-Mail This Post/Page
Enter your password to view comments
Thu
23
Aug '07

Professional SEO with ASP.NET Released!

And it's not just a patched-together port of Search Engine Optimization with PHP, either. Cristian and I did it right. The examples are programmed using typical ASP.NET design patterns, and we built reusable components for your .NET applications. So if you're an ASP.NET programmer and you need to know how to effectively use rewriting tools, implement cloaking and geo targeting, or just want a good reference, buy the book.

Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO teaches you, with step-by-step coding examples:

* To understand the most important criteria that influence search engine rankings
* How to create keyword-rich URLs by implementing URL rewriting using ASP.NET, ISAPI_Rewrite, and UrlRewriter.NET
* How to use HTTP headers to properly indicate the status of web documents
* How to create optimized content and cope with duplicate content effectively
* How to avoid being the victim of black hat SEO techniques
* How to implement geo-targeting and cloaking
* How to use sitemaps effectively, for users as well as for search engines
* How to creative interactive link bait applications
* Some SEO enhancements that can easily be applied to existing sites
* And lastly, we build a search engine friendly e-commerce catalog

* If you are a search marketing blogger and would like a copy for review, contact me. Please indicate which blog you write for and I'll get back to you if you are accepted.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  E-Mail This Post/Page
6 Comments »
Thu
16
Aug '07

How To Guide: Prevent Google Proxy Hacking

One day your traffic comes to a grinding halt. What happened? Check the index. Google may have found all your reciprocal links from i-hump-sheep.info and white-castle-coupons.biz. But it's also possible that you have been "proxy hacked." That's the term being tossed around by a few people who have been mum on it for awhile -- Alan Perkins, Danny Sullivan, Bill Atchison, Brad Fallon, and a few other people that are actually exploiting this hole right now (and whom we don't know).

And it's likely the reason Google (http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html
), Yahoo (http://www.ysearchblog.com/archives/000460.html), MSN (http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx), and Ask (http://about.ask.com/en/docs/about/webmasters.shtml#21) all published guidelines to detect whether a bot is, indeed, an authentic bot. I was wondering about that when I heard about it. Now it all comes together.

So what's going on? Dan Thies has a summary up about it, and he'll probably do a much better job explaining it since he's not a programmer at heart like me. I'll end up speaking in pseudocode ... so read Dan's summary over here. Here's a tidbit that explains a lot of it away:

With the introduction of "Big Daddy," Google crawls from many different data centers; they also changed the algorithm substantially at the same time. According to Dan "It appears that the changes include moving some of the duplicate content detection down to the crawlers. [This is problematic. In short:]

1. The original page exists in at least some of the data centers.
2. A copy (proxy) gets indexed in one data center, and that gets sync'd across to the others.
3. A spider visits the original, checks to see if the content is duplicate, and erroneously decides that it is.
4. The original is dropped or penalized.
"
So ... the problem is that if you flood Google with massive amounts of duplicate content, it exposes a vulnerability. Eventually the algorithm makes a mistake, and your content is no longer authoritative.

Oops!

How To Fight Back -- Code implementations

Well that's where I come in. I have 2 implementations in beta (read: they work according to my tests, but I'm going to be testing more) that address the problem based on the methods the search engines cite. Then, essentially, we're using a benign form of cloaking (yes, cloaking!) to make it more difficult for bad bots, proxies, etc. to exploit us.

The code is located here

I'll expand the explanation in that documentation to make it easier to comprehend/install. But if you know PHP, dive right in.

The code and concepts were primarily based off on the book I coauthored, "Search Engine Optimization with PHP." It is my sentiment that most SEOs have to be aware of technology more so than they think -- hence the book authored by me and co-author Cristian Darie. This is just one example.

You can see it in action here:
http://www.seoegghead.com/tools/test-simple-cloak.php

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • Reddit
  E-Mail This Post/Page
23 Comments »