Note: the code for the auditing script is located here.

As a programmer, I cannot stress it enough. What is it? Escaping all data processed by your web application's code! It's a common security issue, but most people are only accustomed to it, these days, in the context of SQL. Every programmer worth salt knows that he must escape/sanitize data sent to a SQL database. Otherwise, carefully-constructed input can form a totally cool query that exposes and/or vandalizes data. Despite this, many programmers forget to escape SQL input; and even more of them forget to do the same for HTML input!

Even the terminology reflects the apathy. You "escape" SQL with mysql_real_escape_string(), but you "convert special characters" using "htmlspecialchars()" or "htmlentities()." In addition, there are huge glaring comments about why one should escape SQL.

From mysql_real_escape_string():
"This function must always (with few exceptions) be used to make data safe before sending a query to MySQL."

But neither pages for the HTML escaping functions say anything along the lines of "You must escape your HTML, otherwise people can use carefully crafted parameters to tell the world you advocate pre-teen sex and link to NAMBLA (note: I nofollowed that link). Phew! Obviously, your site could also show support for other sites engaging in legal sex, or even boring spam links for casinos; but either way, you probably want to make sure you don't advocate any of the above without actually knowing it. Or maybe you're a black hatter who wants to snoop around to find some more benevolent free links. I don't care. Just don't advocate NAMBLA.

I cannot stress enough that this is a major problem that is largely ignored. Fix affected sites or someone else will eventually make you fix it. I've posted an auditing script here.

Basically, this code takes a list of pages, parses them, and picks out the forms on the pages. It sends the forms as is (values, checks in checkboxes in tact, etc.), but sets the first text box, assumed to be a sort of query-field, to "<h1>testing 123</h1>" using cURL. Then it searches for that string in the reponse to the form. If it's there verbatim, and unescaped, it's a potentially valid attack.

I used a slightly-modified HTML parser written by Jose Solorzano, and a few regular expressions. Currently, you must provide a list of URLs you want to test. These can be derived from your web site logs somehow, or a call to the Yahoo REST API using the "site:" command. I will not provide the code for that to prevent script kiddies from running this script on "" Do what you wish; like anything, this tool can be used for good or evil, and can probably even get you arrested. </moral rant>. Have fun!

Here's some sample output:
Looking at; 2 form(s) found.
HRMM; no attack found for:; form 1
HRMM; no attack found for:; form 2

Tell an amigo:
  • Sphinn
  • Digg
  • Reddit
  • StumbleUpon
  • Facebook

Related posts:
Code for HTML Auditing <? // +———————————————————————-+ // | HTMLParser                                                           | // | Simple HTML Parsing Library                                          | // | Based on Jose Solorzano's Library; his notice is below.              | // +———————————————————————-+ // | Portions Copyright (c) 2004-2005 Jaimie Sirovich                     | // +———————————————————————-+ // | This program is free software; you can redistribute it and/or        |...
XSS & HTML Injection are Frighteningly Trivial to Find at This recent article mentions that XSS and HTML injection are...
Find HTML Injection Vulnerabilities with Google Code Search I guess I think like a hacker, because I thought...
WP HTML Taint Check <? /*Plugin Name: HTMLTaintCheckPlugin URI: Checks for suspicious links in posts. MAKE SURE TO REPLACE YOUR EMAIL ADDRESS IN...