Microsoft applications have this nasty habit of exchanging both your single and double quotes with "smarter" versions.  They curve inwards and look really snazzy in Microsoft Word.  When you cut and paste them, they're unencoded, as Windows assumes that everyone is using windows-1252 or something.  Unfortunately, that's pretty annoying if you're not using a Windows character set.  So you may want to alter them using the regular expressions that follow.

ROTD stands for "regex of the day," in case you're wondering.  And I was going for clarity here — not efficiency, so don't point out that this could be written more efficiently.  I know.  Here is the code:

<?

function fixSmartQuotes($str$replace_single_quotes true$replace_double_quotes true$replace_emdash true$use_entities false)
{

    $translation_table_ascii = array(
        
145 => '\''
        
146 => '\''
        
147 => '"'
        
148 => '"'
        
151 => '-'
    );

    $translation_table_entities = array(
        
145 => '&lsquo;'
        
146 => '&rsquo;'
        
147 => '&ldquo;'
        
148 => '&rdquo;'
        
151 => '&mdash;'
      );

    $translation_table = ($use_entities $translation_table_entities $translation_table_ascii);

    if ($replace_single_quotes) {
        
$str preg_replace('#\x' dechex(145) . '#'$translation_table[145], $str);
        
$str preg_replace('#\x' dechex(146) . '#'$translation_table[146], $str);
    }

    if ($replace_double_quotes) {
        
$str preg_replace('#\x' dechex(147) . '#'$translation_table[147], $str);
        
$str preg_replace('#\x' dechex(148) . '#'$translation_table[148], $str);
    }

    if ($replace_emdash) {
        
$str preg_replace('#\x' dechex(151) . '#'$translation_table[151], $str);
    }
    
    return 
$str;

}

echo fixSmartQuotes("“”’‘—");

?>

Seriously, these things are annoying — especially for Linux users; I cut and paste word documents into textareas all the time, and these characters appear everywhere.  This function will save you from having to edit them out manually.  Depending on which options you select, it alters some or all of the "problem" characters.  It also lets you use HTML entities to encode and retain the "smart" quotes instead of replacing them with "dumb quotes."  Happy programming!

Tell an amigo:
  • Sphinn
  • Digg
  • Reddit
  • del.icio.us
  • StumbleUpon
  • Facebook



Related posts:
ROTD: Removing CSS Bloat This function, comprised of a simple regular expression, will remove...
Real-Time Stock Quotes Gone from Google? Oh no!  Real-time (ECN) stock quotes appear to be gone...
Curly vs. Square I've always used '{' and '}' to access characters in...
Real-Time Stock Quotes Back! OK, so it was just a temporary thing.  Phew!  Now...
ROTD: Mod_rewrite Rule To Remove Trailing "index.php" This handy set of rules for mod_rewrite automatically redirects any...