- Jul. 7th, 2006
- 1 comments
I wrote this script so that I can run it over arbitrary HTML. Basically, what it does is it adds rel="nofollow" to links that are not in an array of whitelisted domains. It sounds simple, but it gracefully handles many conditions that most people do not bother with.
1) As mentioned before, it has a whitelist. Obviously, you may want to whitelist your own site!
2) It adds the parameter to a preeexisting rel attribute if applicable, i.e. rel="foo" becomes rel="foo,nofollow"
3) Works on unquoted parameters, i.e. rel=foo. So here it is:
If you don't know why this is important, read this.
<?
function _nfl($_)
{
global $_nfl_wl;
$whitelist = $_nfl_wl;
if (!preg_match('#rel\s*?=\s*?([\'"]?)((?(1)[^\'>"]*|[^\'>" ]*))(\b)nofollow(\b)((?(1)[^\'>"]*|[^\'>" ]*))[\'"]?#i', $_)) {
preg_match('#href\s*?=\s*?[\'"]?([^\'" ]*)[\'"]?#i', $_, $captures);
$href = $captures[1];
if (preg_match('#^\s*http://#', $href)) {
$parsed = parse_url($href);
if (in_array($parsed['host'], $whitelist)) {
return $_;
} else {
$_x = preg_replace('#(rel\s*=\s*([\'"]?))((?(3)[^\'>"]*|[^\'>" ]*))([\'"]?)#i', "\1\3,nofollow\4", $_);
if ($_x != $_) {
return $_x;
} else {
return preg_replace('#<a#', '<a rel="nofollow"', $_);
}
}
} else {
return $_;
}
} else {
return $_;
}
}
function noFollowLinks($str, $whitelist = array())
{
global $_nfl_wl;
$_nfl_wl = $whitelist;
return preg_replace_callback("#(<a.*?>)#i", create_function('$matches', 'return nfl($matches[1]);'), $str);
}
?>
Simply call noFollowLinks() and pass in a buffer and and array of whitelisted domains. Wasn't that easy ? :) Of course, you could also use it on a single link, but it works on a whole document if needed.
Related posts:
"Only One Wise Comment Banged Out Somewhere On The Internet ..."
Sweet… nicely done. I'll be sure to incorporate it into my site. Thanks!
|
















