Earlier this week I came across a person looking to find a local (to Louisiana) car safety inspection location. I think most states require this but they differ on schedules. Louisiana recently moved to letting you pay more for a two-year sticker which is nice, but it is still a bit of a hassle if you don't know where an inspection location can be found. Turns out - there is a web page for it: http://www.dps.state.la.us/safetydirections.nsf/f3f91999370ccaed862574a20074b158?OpenView.
I looked at this and thought - wouldn't it be cool if we could find the nearest station based on your current location. Turns out it was possible - just not very pretty. I've split this blog entry into two parts - getting the data - and using the data. If you don't care how I scraped the site, feel free to scroll down to the next part.
Scraping the Data
I had hoped the site was using fancy Ajax Ninja stuff with cool JSON-based data sources, but I quickly discovered that it was not. It was pure HTML. Lots, and lots, and, oh my god, lots of HTML. I began by figuring out how the site was set up. The home page contains a list of all the parishes:
Clicking a triangle (but oddly, not the parish name) opens a list of places where you can get your car inspected.
This gives you the location name and address. But to get hours of operation you need to click for details.
All in all, this gave me two things to scrape. First was a list of the locations, which can only be found by first getting all the parishes. Then for each location we needed to get the detail page for the hours of operation. Finally, I could take all those addresses and do a geocode on them to get precise locations.
What follows is a set of ColdFusion scripts I wrote to perform this task. These files are ugly. The HTML used on these pages were messy as hell. The phone numbers had multiple spans/font tags etc. It was a mess. I also took the opportunity to try some fancy ColdFusion 11 updates as well. All in all, this code is quite disgusting, but I'll share it so you can use it to scare away monsters.
First, open up all the parishes and save the location data.
<cfscript>
rootUrl = "http://www.dps.state.la.us/safetydirections.nsf/f3f91999370ccaed862574a20074b158?OpenView&Start=1&Count=1200";
//<cfset links = rematch("/safetydirections.nsf/.*?Expand=.*?""",cfhttp.fileContent)>
//<cfdump var="#links#">
///safetydirections.nsf/f3f91999370ccaed862574a20074b158?OpenView&Start=1&Count=1200&Expand=2#2" target="_self">
//number of parishes but I call it pages, because.
totalPages = 62;
//totalPages = 3;
locations = [];
for(i=1; i<= totalPages; i++) {
theUrl = rootUrl & "&Expand=#i#";
writeoutput(theUrl & "<br/><hr>");
cfhttp(url=theUrl);
//writeoutput("<pre>#htmlEditFormat(cfhttp.filecontent)#</pre>");
matches = reMatch("<font color=""##0000ff"">.*?</tr>",cfhttp.fileContent);
matches.each(function(m) {
var location = {};
var linkre = reFind("<a href=""(.*?)"">", m, 1, true);
location["link"] = m.mid(linkre.pos[2], linkre.len[2]);
var namere = reFind("<a href="".*?"">(.*?)</a>", m, 1, true);
location["name"] = m.mid(namere.pos[2], namere.len[2]);
var tds = reMatch("<td>(.*?)</td>", m);
var address = rereplace(tds[1], "<td><b><font color=""##0000ff"">(.*?)</font></b></td>", "\1");
address = address.replace("<br>","");
location["address"] = address;
location["types"] = [];
var typeList = rereplace(tds[3], "<td><b><font color=""##0000ff"">(.*?)</font></b></td>","\1");
typeList = typeList.replace("<br>", ",", "all");
typeList.each(function(t) {
t = trim(t);
location["types"].append(t);
});
//writedump(location);
// writedump(m);
locations.append(location);
});
// writedump(matches);
}
writedump(locations.len());
fileWrite(expandPath("./data1.json"), serializeJSON(locations));
</cfscript>
Next, get the details. This includes the hours of operation I mentioned earlier, as well as the phone number.
<cfscript>
rootUrl = "http://www.dps.state.la.us/";
data = deserializeJSON(fileRead(expandPath("data1.json")));
//filter by items w/o a phone number
writeoutput("There are #data.len()# items.<br/>");
/*
filtered = data.filter(function(x) {
return !structKeyExists(x, "phoneNumber");
});
writeoutput("There are #data.len()# items to process.<br/>");
*/
counter=0;
data.each(function(l) {
counter++;
if(counter mod 100 is 0) {
writeoutput("#counter#<br/>");
cfflush();
}
//Only get if we don't have the data already
if(structKeyExists(l, "phoneNumber")) continue;
cfhttp(url="#rootUrl#/#l.link#");
var content = cfhttp.fileContent;
var found = reMatch('Area Code</font></b><b><font color="##0000FF" face="HandelGotDLig"> </font></b><b><font color="##ff0000" face="HandelGotDLig">.*?</font>', content);
var areaCode = found[1].rereplace(".*>([0-9]{3})</font>", "\1");
found = reMatch('Phone Number</font></b><b><font color="##FF0000" face="HandelGotDLig"> </font></b><b><font color="##ff0000" face="HandelGotDLig">.*?</td>', content);
var phoneFirst = found[1].rereplace(".*>([0-9]{3})</font>.*", "\1");
var phoneSecond = found[1].rereplace(".*>([0-9]{4})</font>.*", "\1");
var phoneNumber = "(" & areaCode & ") " & phoneFirst & "-" & phoneSecond;
// writeoutput("<b>#phoneNumber#</b><p>");
found = content.reMatch('Hours of Operation.*?</tr>');
var hoo = found[1].rereplace(".*?</td><td width=""536"">(.*?)</td></tr>", "\1");
hoo = hoo.rereplace("<.*?>", " ", "all");
hoo = hoo.rereplace("[[:space:]]{2,}", " ");
// writedump(found);
// writeOutput("<pre>"&htmlEditFormat(cfhttp.fileContent)&"</pre>");
// abort;
l["phoneNumber"] = phoneNumber;
l["hours"] = hoo;
fileWrite(expandPath("data1.json"), serializeJSON(data));
});
writeoutput("<p>Done!</p>");
</cfscript>
Finally, do the geocoding.
<cfscript>
geo = new googlegeocoder3();
data = deserializeJSON(fileRead(expandPath("data1.json")));
writeoutput("There are #data.len()# items.<br/>");
counter=0;
data.each(function(l) {
counter++;
if(counter mod 100 is 0) {
writeoutput("#counter#<br/>");
cfflush();
}
//Only get if we don't have the data already
if(structKeyExists(l, "long")) continue;
var res = geo.googlegeocoder3(address = l.address);
l["long"] = res.longitude[1];
l["lat"] = res.latitude[1];
fileWrite(expandPath("data1.json"), serializeJSON(data));
});
writeoutput("<p>Done!</p>");
</cfscript>
Note - I used one more script to remove the link property from my data file to make it a bit smaller. So at this point, I had a data.json file containing every location in Louisiana where you can get your car inspected. I also had their phone numbers, hours of operation, and longitude and latitude. Woot! Now for the fun stuff - the front end!
Using the Data
For my front end, I decided to go simple. No bootstrap. No UI framework at all. Just a simple div to display dynamic data. I could make this pretty, but why bother?
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title></title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width">
</head>
<body>
<div id="status"></div>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="app.js"></script>
</body>
</html>
The real fun happens in app.js. I'll share the entire file, then describe what each part does.
var $status;
var geoData;
var myLong;
var myLat;
$(document).ready(function() {
$status = $("#status");
//Do we have the data locally?
geoData = localStorage["geocache"];
if(!geoData) {
$status.html("<i>Fetching initial data set. Please stand by. This data will be cached for future operations.</i>");
$.getJSON("data1.json").done(function(res) {
console.log("Done");
localStorage["geocache"] = JSON.stringify(res);
geoData = res;
$status.html("");
getLocation();
});
} else {
geoData = JSON.parse(geoData);
getLocation();
}
});
function getLocation() {
$status.html("<i>Getting your location.</i>");
navigator.geolocation.getCurrentPosition(gotLocation, failedLocation);
}
function failedLocation() {
$status.html("<b>Sorry, but we were unable to get your location.</b>");
}
function gotLocation(l) {
myLong = l.coords.longitude;
myLat = l.coords.latitude;
appReady();
}
function appReady() {
$status.html("<i>Now searching for nearby locations.</i>");
for(var i=0;i<geoData.length;i++) {
var dist = getDistanceFromLatLonInKm(myLat, myLong, geoData[i].lat, geoData[i].long);
geoData[i].dist = dist;
}
geoData.sort(function(x,y) {
if(x.dist > y.dist) return 1;
if(x.dist < y.dist) return -1;
return 0;
});
var s = "<h2>Nearby Locations</h2>";
for(var i=0;i<Math.min(9, geoData.length); i++) {
s+= "<p><b>"+geoData[i].name+"</b><br/>";
s+= geoData[i].address+" "+Math.round(geoData[i].dist)+" km away<br/>";
s+= "<a href='tel:"+geoData[i].phoneNumber+"'>"+geoData[i].phoneNumber+"</a><br/>";
s+= "Hours: "+geoData[i].hours+"<br/>";
s+= "Types: "+geoData[i].types.join(", ")+"<br/>";
s+= "</p>";
}
$status.html(s);
}
//Credit: http://stackoverflow.com/a/27943/52160
function getDistanceFromLatLonInKm(lat1,lon1,lat2,lon2) {
var R = 6371; // Radius of the earth in km
var dLat = deg2rad(lat2-lat1); // deg2rad below
var dLon = deg2rad(lon2-lon1);
var a =
Math.sin(dLat/2) * Math.sin(dLat/2) +
Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) *
Math.sin(dLon/2) * Math.sin(dLon/2)
;
var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
var d = R * c; // Distance in km
return d;
}
function deg2rad(deg) {
return deg * (Math.PI/180)
}
So, the first thing I wondered was - how do I handle the data? It was 700K, which isn't too big, but isn't tiny either. I decided to simply store the data in LocalStorage. I could also store an "update date" key so I knew when to refresh the data, but for now, what I have is sufficient. Get it - store it - and carry on.
Once we have the data file, we then simply detect where you are. This is boilerplate geolocation stuff so it isn't terribly fancy.
Next - we need to determine the distance between you and each location. There were quite a few locations (1,916) so I was concerned about the timing, but this portion ran very quickly as well. Then it was simply a matter of a sort operation. I display the closest 10 locations and that's it. Of course, these numbers are a bit high as I'm in San Francisco. ;)
If you want to try this yourself, just hit the demo link below. Enjoy!