As a blogger, I write quite a few blog posts. I hate RTEs (Rich Text Editors) so I'll typically do most of any desired HTML by hand. Normally this isn't a big deal. My blogware can handle paragraphs and code formatting. I typically just worry about bold and italics. However, because I'm entering HTML manually, there's always a chance I could screw up. I've got a Preview feature on my blog but I rarely use it.

For a while now I've wondered if there was some way to possible detect bad HTML via JavaScript. I decided today to take a crack at it using some simple regex. I figured if we could detect all tags, maybe we could use a simple counter to keep track of opening and closing tags. Obviously that's not terribly precise, but for the types of mistakes I make, it would actually work out ok most of the time. I worked on it a bit and came up with the following little demo:

<html> <head> <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script> <script> $(document).ready(function() {

$("#testBtn").click(function(e) { var code = $.trim($("#code").val()); if(code == '') return;

var regex = /<.*?>/g; var matches = code.match(regex); if(!matches.length) return;

var tags = {};

$.each(matches, function(idx,itm) { console.log("Raw tag: "+itm);

//if the tag is, <..../>, it's self closing if (itm.substr(itm.length - 2, itm.length) != "/>") {

//strip out any attributes var tag = itm.replace(/[<>]/g, "").split(" ")[0]; console.log("Tag : " + tag); //start or end tag? if (tag.charAt(0) != "/") { if (tags.hasOwnProperty(tag)) tags[tag]++; else tags[tag] = 1; } else { var realTag = tag.substr(1, tag.length); console.log("Real tag is -" + realTag); if (tags.hasOwnProperty(realTag)) tags[realTag]--; else tags[realTag] = -1; } } });

console.dir(tags);

var possibles = []; for (tag in tags) { if(tags[tag] != 0) possibles.push(tag); } if (possibles.length) { $("#status").text("There appear to be some hanging tags in your textarea: "+possibles.join(",")); } });

}); </script> </head>

<body>

<div id="status"></div>

<form> <textarea name="code" id="code" cols="70" rows="30"></textarea><br/> <input type="button" id="testBtn" value="Test"> </form> </body> </html>

Basically, I used a simple regex to find any HTML tag:

var regex = /<.*?>/g;

And from that, I loop over the matches and figure out a) the real tag (so I ignore attributes for example) and if it is closing or not. I use a simple numeric value to either increment/decrement a counter of tags. I also try to support self closing tags like <p/>.

It's not the most scientific method, but it seems to work well in my testing. Check it out at the demo below.