Note: the code for the auditing script is located here.
As a programmer, I cannot stress it enough. What is it? Escaping all data processed by your web application's code! It's a common security issue, but most people are only accustomed to it, these days, in the context of SQL. Every programmer worth salt knows that he must escape/sanitize data sent to a SQL database. Otherwise, carefully-constructed input can form a totally cool query that exposes and/or vandalizes data. Despite this, many programmers forget to escape SQL input; and even more of them forget to do the same for HTML input!
Even the terminology reflects the apathy. You "escape" SQL with mysql_real_escape_string(), but you "convert special characters" using "htmlspecialchars()" or "htmlentities()." In addition, there are huge glaring comments about why one should escape SQL.
From mysql_real_escape_string():
"This function must always (with few exceptions) be used to make data safe before sending a query to MySQL."
But neither pages for the HTML escaping functions say anything along the lines of "You must escape your HTML, otherwise people can use carefully crafted parameters to tell the world you advocate pre-teen sex and link to NAMBLA (note: I nofollowed that link). Phew! Obviously, your site could also show support for other sites engaging in legal sex, or even boring spam links for casinos; but either way, you probably want to make sure you don't advocate any of the above without actually knowing it. Or maybe you're a black hatter who wants to snoop around to find some more benevolent free links. I don't care. Just don't advocate NAMBLA.
I cannot stress enough that this is a major problem that is largely ignored. Fix affected sites or someone else will eventually make you fix it. I've posted an auditing script here.
Basically, this code takes a list of pages, parses them, and picks out the forms on the pages. It sends the forms as is (values, checks in checkboxes in tact, etc.), but sets the first text box, assumed to be a sort of query-field, to "<h1>testing 123</h1>" using cURL. Then it searches for that string in the reponse to the form. If it's there verbatim, and unescaped, it's a potentially valid attack.
I used a slightly-modified HTML parser written by Jose Solorzano, and a few regular expressions. Currently, you must provide a list of URLs you want to test. These can be derived from your web site logs somehow, or a call to the Yahoo REST API using the "site:" command. I will not provide the code for that to prevent script kiddies from running this script on "site:www.whitehouse.gov." Do what you wish; like anything, this tool can be used for good or evil, and can probably even get you arrested. </moral rant>. Have fun!
Here's some sample output:
Looking at http://www.seoegghead.com; 2 form(s) found.
HRMM; no attack found for: http://www.seoegghead.com; form 1
HRMM; no attack found for: http://www.seoegghead.com; form 2












June 30th, 2006 at 4:04 pm
[...] Jaimie Sirovich just released a tool for doing XSS relfection auditing this morning. The way he describes it’s function is that it looks for parameters and injects a small snippet of HTML. If that HTML is seen once the server returns data you know it’s vulnerable to XSS. Of course that’s not always the truth, and there are many other forms of XSS that are missed by this approach but it’s free, and you can’t beat free. Of course, he is talking about it in context of Blackhat SEO, where you can raise your own page rank by injecting XSS into pages that have a high page rank. [...]