I wrote this script so that I can run it over arbitrary HTML. Basically, what it does is it adds rel="nofollow" to links that are not in an array of whitelisted domains. It sounds simple, but it gracefully handles many conditions that most people do not bother with. 

1) As mentioned before, it has a whitelist. Obviously, you may want to whitelist your own site!
2) It adds the parameter to a preeexisting rel attribute if applicable, i.e. rel="foo" becomes rel="foo,nofollow"
3) Works on unquoted parameters, i.e. rel=foo. So here it is:

If you don't know why this is important, read this.

<?

function _nfl($_)
{
    global 
$_nfl_wl;
    
$whitelist $_nfl_wl;
    if (!
preg_match('#rel\s*?=\s*?([\'"]?)((?(1)[^\'>"]*|[^\'>" ]*))(\b)nofollow(\b)((?(1)[^\'>"]*|[^\'>" ]*))[\'"]?#i'$_)) {
        
preg_match('#href\s*?=\s*?[\'"]?([^\'" ]*)[\'"]?#i'$_$captures);
        
$href $captures[1];
        if (
preg_match('#^\s*http://#'$href)) {
            
$parsed parse_url($href);
            if (
in_array($parsed['host'], $whitelist)) {
                return 
$_;
            } else {
                
$_x preg_replace('#(rel\s*=\s*([\'"]?))((?(3)[^\'>"]*|[^\'>" ]*))([\'"]?)#i'"\1\3,nofollow\4"$_);
                if (
$_x != $_) {
                    return 
$_x;
                } else {
                    return 
preg_replace('#<a#''<a rel="nofollow"'$_);
                }
            }
        } else {
            return 
$_;
        }
    } else {
        return 
$_;
    }
}
    
function 
noFollowLinks($str$whitelist = array())
{
    global 
$_nfl_wl;
    
$_nfl_wl $whitelist
    return 
preg_replace_callback("#(<a.*?>)#i"create_function('$matches''return nfl($matches[1]);'), $str);  
}

?>

Simply call noFollowLinks() and pass in a buffer and and array of whitelisted domains.  Wasn't that easy ? :)  Of course, you could also use it on a single link, but it works on a whole document if needed.

Tell an amigo:
  • Sphinn
  • Digg
  • Reddit
  • del.icio.us
  • StumbleUpon
  • Facebook



Related posts:
Checking for Dead Links Automatically This neat little class can return the HTTP status code...
Some Notes on Rel="Nofollow" I know there is some consensus on at least the...
Simple Cloak PHP Library Tell an amigo: ...
Simple GeoTarget PHP Library This class can be used to detect where a site...
Google Spiders (Very) Simple Forms I used to assume that content behind forms was never...