This neat little class can return the HTTP status code of a URL.  It uses cURL to do so.  Simply take the result of "parseResponseCode"  and see if it's a 200.  Depending on your requirements, 302 or 301 may also be a satisfactory answer, or you may want to update the record (at least in the case of a 301), or recurse.  If the answer is a 404, you know you've found trouble. 

It's important to check for dead links, as too many of them can be detrimental to your site's ranking, not to mention it's annoying for the user. If it's too complicated (or perhaps impossible) to automatically remove them, simply email a log to yourself with the information and take care of it manually at interval.  Bill Slawski mentions in his blog the (detrimental) effect of "web decay."  I think this is important.

Below is the code in PHP.

<?

$LINKCHECKER_total_str '';

// +———————————————————————-+
// | LinkChecker                                                          |
// | Gets headers using Curl                                              |
// +———————————————————————-+
// | Copyright (c) 2003 Jaimie Sirovich                                   |
// +———————————————————————-+
// | Author: Jaimie Sirovich <jsirovic@gmail.com>                         |
// +———————————————————————-+

class LinkChecker
{

    function CURLOPT_WRITEFUNCTION($ch$str)
    {
        global 
$LINKCHECKER_total_str;
        
$LINKCHECKER_total_str .= $str;
        if (
preg_match('/^(.*?)\r\n\r\n/s'$LINKCHECKER_total_str$matches)) {
            echo 
$matches[1];
            return -
1;
        } else  {
            return 
strlen($str);
        }    
    }    
        
    function 
getHeader($url$userAgent "Mozilla/4.0")
    {
       global 
$LINKCHECKER_total_str;
       
$LINKCHECKER_total_str "";
       
ob_start();    
       
$ch curl_init();
       
curl_setopt ($chCURLOPT_URL$url);
       
curl_setopt ($chCURLOPT_USERAGENT$userAgent);
       
curl_setopt ($chCURLOPT_HEADER1);
       
curl_setopt ($chCURLOPT_RETURNTRANSFER1);
       
curl_setopt ($chCURLOPT_FOLLOWLOCATION1);
       
curl_setopt ($chCURLOPT_TIMEOUT60);
       
curl_setopt ($chCURLOPT_HEADER1);
       
       
curl_setopt ($chCURLOPT_WRITEFUNCTION, array("LinkChecker""CURLOPT_WRITEFUNCTION"));
       
       
$result curl_exec($ch);
       
curl_close($ch);
       return 
ob_get_clean();
    }
    
    function 
parseResponseCode($str) {
        
preg_match("/^HTTP\/\d\.\d (.{3})/"$str$matches);
        return 
$matches[1];
    }
    
    function 
parseMimeType($str) {
        
preg_match("/Content-type: (.*)/"$str$matches);
        return 
$matches[1];
    }
    
    function 
parseContentLength($str) {
        
preg_match("/Content-length: (.*)/"$str$matches);
        return 
$matches[1];    
    }
    
}

?>

Tell an amigo:
  • Sphinn
  • Digg
  • Reddit
  • del.icio.us
  • StumbleUpon
  • Facebook



Related posts:
A (not so simple) method to add rel="nofollow" to links I wrote this script so that I can run it...
How To Get Dugg More: Digg for WordPress Plugin So I got Dugg — but I'm convinced this trivial...
Code for HTML Auditing <? // +———————————————————————-+ // | HTMLParser                                                           | // | Simple HTML Parsing Library                                          | // | Based on Jose Solorzano's Library; his notice is below.              | // +———————————————————————-+ // | Portions Copyright (c) 2004-2005 Jaimie Sirovich                     | // +———————————————————————-+ // | This program is free software; you can redistribute it and/or        |...
Automatically Highlighting Internal Links I find it very annoying when I click on a...
Yahoo API Code <?php /* sample usage: $tree = new xmlTreeParser('sample.xml'); echo '<pre>'; print_r($tree->getTree()); */ class xmlTreeParser {     var $_parser;     var $_xmldata;...