John sent in an interesting topic to me:
Problem: User types in the wrong address. Your site generates a 404 error, and calls your custom coldfusion 404 handler.
Solution: Perform a smarter suggestion for possible page matches. This will work very much like a full-text search engine would auto-suggest words. The custom handler would need to match "conatct" with "contact."
I'll bet we could dig into java to do some sort of dictionary lookup somewhere!
I think this is an absolutely great idea, and it touches on something I've blogged about before. It's pretty trivial to write a 404 handler with Adobe's web application product. The following script will send any CFM request it can't handle to a 404 page:
public boolean function onMissingTemplate(string targetpage) {
location(url="404.cfm");
return true;
} }
component {
this.name="missing";
This by itself would be an improvement to most sites (shoot, evne mine). But by itself you are missing out on a lot of opportunities to actually - you know - help the user find what they want. So for example, I could easily add a quick log:
public boolean function onMissingTemplate(string targetpage) {
writelog(file="404",text="#arguments.targetpage#?#cgi.query_string#");
location(url="404.cfm");
return true;
} }
component {
this.name="missing";
And then periodically check the log file for common issues. Let's say we see cases of what John used an example. We could easily handle it like so:
public boolean function onMissingTemplate(string targetpage) {
//handle some common ones...
if(listLast(arguments.targetpage,"/") is "conatct.cfm") location(url="contact.cfm");
writelog(file="404",text="#arguments.targetpage#?#cgi.query_string#");
location(url="404.cfm");
return true;
} }
component {
this.name="missing";
Now - what you probably don't want is a giant set of IF statements, or even a switch statement. That can get messy pretty quickly. John suggested a dynamic based approach. You could - in theory - keep a list of files and see if any are "close" to the request. (Perhaps using levDistance.) But this is something you would want to cache heavily.
To me the critical thing here is this: Do you have a good understanding of how people are using your site? What things are they requesting that are not being found? Did CNN link to your site and screw up? You're going to have a lot more success handling it yourself than getting CNN to fix it probably. What are people searching for on your site? I just searched for xbox360 on Sony.com and the results were pitiful. Why not provide a link to a comparison between the PS3 and the XBox? Why not show a list of PS3 exclusives? But most of all - is there someone who is making it their job to see what's being searched for and actually respond to those requests.
This isn't a code issue at all. (Although certainly code can help us generate and report metrics.) It's a basic "Site Awareness" that far too many of us are lacking in. (To be fair, in some companies you have to beg for basic QA!) As I said, this is something I've blogged about before, and it's something I think about when I can't sleep. I'd love to get some comments from folks who are dealing with this - or at least thinking about dealing with this today.
Archived Comments
This is along the same lines as I was doing with my 404 rewrite http://cf404rewrite.riaforg...
Ray the only problem here is that you are assuming people are going to ask for a .cfm file. so the handler would not pick it up if it were a .htm or even a folder request /xbox360/
If you setup a custom 404 on your web server, then you could also possibly use Solr to search your content folder for possible results and display a list of suggested pages to the user.
I bet you could even write into each page, its possible misspelled name variants or meta data that it could be searched against and then write a custom search that would parse each of the content pages and read through these meta tags for results. that way you could control the pages that you wanted to search against.
Anyway just some ideas there.
Tim
Tim - yeah - sorry - I shoulda mentioned the code above was CFM only, and it makes sense to handle it web server level too.
The idea of metadata in the file is pretty darn interesting.
I suspect this is where user interface experience comes in . . . might well be worth shelling out some $$$ to hire a UX consultant to take a look at your mission-critical site.
Another strategy would be to send them to the site map. Then they can see if what they were looking for exists on the site. I dislike half-baked AI schemes that try to guess what I want. Once you're logging the failed requests I would use mod rewrite to redirect the most common ones.
Make sure you add in your statusCode to the location function! You'll want to permanently redirect (301) to the correct page so crawlers etc don't hang on to the old incorrect missing page URL.
One interesting approach that I have been meaning to explore is to use the http referrer to customize the 404 error page based on the source of the bad link. This also allows you to fire off an email if the bad link is internal to the site. This article gives an overview of the different options:
http://www.alistapart.com/a...
Google has a widget that you can put on your 404 page that attempts to find the closest match to the requested page. I haven't used it and I'm not sure if it's really supported any more, but it's an interesting option:
http://googlewebmastercentr...
You can also just go with something fun!
http://mashable.com/2011/01...
I cache a site map cfc component in my application scope that has a search () method in it. On the web server (I use IIS), I point my 404 errors to 404.cfm, which looks at CGI.query_string to see what was requested. I also call 404.cfm (using cfmodule) from the CF onMissingTemplate method, passing it the requested page as an attribute.
The fun part happens next... I tokenize the request and then use the tokens to search my site map for possible matches. This works pretty well when spelling is correct, but I didn't get any matches for "conatct." A good 404 handler would offer "contact" as a possible match for "conatct."
To solve this, I improved my 404 handler by first asking Google to expand my token list, by returning spelling suggestions for each token, and then doing a sitemap search. This works incredibly well and is fast.
Here's a code example that calls the google suggest api and uses jquery and a CFC that I wrote to make calls into Google. It includes a download of all the source files and is very straight forward. Feel free to use it however you want.
http://code.redtopia.com/ex...
"...Adobe's web application product"
Just curious...any particular reason for this choice of words? :)
I seem to recall this happening earlier this year when people turned to their favorite search engine and asked, "What time is the superbowl?" The NFL foolishly did not put that info anywhere useful on their website so they weren't even in the top ten hits.
Can you guess who got to answer the question at #1? Yep, Huffington Post.
Oops, I should mention that I got that info from a Search Engine Optimization article I read somewhere. Maybe it was the New York Times. It had very useful information about improving your site to attract the attention Google and other search engines.
(And of course by writing this I'm improving this blog's SOE value by 110%!)
SEO! Dammit. An edit option would be useful. :(
Hey Ray,
Sorry for what might be a beginner comment on an older post, but where would I put this if I just wanted to log missing template errors for my application? As a cfscript in my missing template handler? or in my app.cfc?
NM,
Found I can just use onMissingTemplate in the app.cfc
Thanks.