Most online forms don't allow HTML, or allow a very strict subset of HTML, but for years now my blog form (the one I'm using right now) has allowed for any and all HTML. I figure if I can't trust myself, who can I trust? Of course, from time to time I screw up and forget to close a tag or make some other mistake. For a while now I've wondered - is there an easy way to check for that and potentially catch those mistakes before I save the form?
I currently know of two services that let you check HTML. One is the W3C Validator. I've used this for one of my Brackets extensions (W3CValidation) and it works ok, but on simple tests it seemed to miss obvious things. For example, given input of: <b>moo
it only notices the lack of a root docttype and declaration and basically throws up. Since my input will always be part of a full HTML page, the blog content, this didn't seem appropriate. Also, it would be asynchronous and I wanted something I could run entirely client-side.
I then decided to check out HTMLHint. HTMLHint provides rules-based reports on HTML input. It is pretty simple to use and - surprise surprise - I also have a Brackets extension for it (Brackets-HTMLHint). Since I could use it entirely client-side, I thought I'd check it out. I built a simple demo that did validation on keyup.
First, I created a simple HTML page and form.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title></title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width">
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js"></script>
<script src="htmlhint.js"></script>
<script src="app.js"></script>
<style>
#blogContent {
width: 500px;
height: 300px;
}
#htmlIssues {
font-weight: bold;
}
</style>
</head>
<body>
<form>
<textarea id="blogContent"></textarea>
<div id="htmlIssues"></div>
</form>
</body>
</html>
Nothing really special here. You can see the textarea as well as an empty div I plan on using for validation. Now let's look at the code.
$(document).ready(function() {
$issueDiv = $("#htmlIssues");
//From docs:
var rules = {
"tagname-lowercase": true,
"attr-lowercase": true,
"attr-value-double-quotes": true,
"doctype-first": false,
"tag-pair": true,
"spec-char-escape": true,
"id-unique": true,
"src-not-empty": true,
"attr-no-duplication": true
}
function htmlEscape(s) {
s = s.replace(/\</g, "<");
s = s.replace(/\>/g, ">");
return s;
}
$("#blogContent").on("keyup", function() {
var content = $(this).val();
var issues = HTMLHint.verify(content, rules);
if(issues.length === 0) {
$issueDiv.html("");
return;
}
console.dir(issues);
var s = "Possible HTML issues found:<ul>";
for(var i=0, len=issues.length; i<len; i++) {
s += "<li>"+htmlEscape(issues[i].message)+"</li>";
}
s += "</ul>";
$issueDiv.html(s);
});
});
From the top, I'm caching my results div so I can reuse it when doing validation. I modified HTMLHint's default rule set to turn off the doctype requirement. (Because, again, I'm validating a small part of a page, not the entire page.) And that's basically it. I run the library's verify method and just render out the results. Here is an example of it in action:
As you can see, it found the unmatched tag pair, but didn't notice that turd
was an invalid tag. To be honest, I'm much more concerned about screwing up tag pairs so that makes me happy enough.
If you want to play with it yourself, I set up a demo below.