Earlier today I saw Adobe blogger John Dowdell retweet a request from a user looking for a way to do translations of tweets. Now - I know that some of my readers are strongly against machine translation. I get why. But - when I saw this the first thing I thought of was Google's excellent API for doing translations. I've blogged about this before. It is fairly simple to take some text, tell Google to translate it, and then work with the result.
I began my exploration by simply setting up a quick search using the Twitter API. I searched for "vous OR camden", thinking "vous" would find French results and robot would find English results. Here is my initial CFML:
<cfset search = "vous OR robot">
<cfset results = []>
<cfhttp url="http://search.twitter.com/search.json?&rpp=20&q=#search#" result="result">
<cfif result.responseheader.status_code is "200">
<cfset content = result.fileContent.toString()>
<cfset data = deserializeJSON(content)>
<cfloop index="item" array="#data.results#">
<cfset arrayAppend(results, item)>
</cfloop>
<cfelse>
<cfdump var="#result#">
</cfif>
<cfloop index="r" array="#results#">
<cfdump var="#r#">
</cfloop>
I won't go into details about the Twitter API here (you can find more at the docs) but you can see that I'm hitting the API with my query and working with a JSON response. I noticed something right away in each Tweet - something I had not noticed before. There is a key called iso_language_code which describes the language of the Tweet itself. I'm guessing Twitter isn't actually parsing the text but is rather simply trusting the client itself. Either way - if it is right most of the time, it's good enough for me.
I began my modifications by adding code to do some basic output and detect non-English results:
<cfloop index="r" array="#results#">
<cfoutput>
<p class="twitter_result">
From: #r.from_user#<br/>
Created: #r.created_at#<br/>
<span class="text">#r.text#</span>
<cfif r.iso_language_code neq "en">
<br/>
<span class="texttranslation lang_#r.iso_language_code#"></span>
</cfif>
</p>
</cfoutput>
</cfloop>
Notice that I'm adding a span after the output when the result isn't English. I use 2 classes within that span. One is used to "mark" it for later (texttranslation) and another is used to record the language. Now I set up some jQuery:
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>
<script src="http://www.google.com/jsapi" type="text/javascript" ></script>
<script>
google.load("language", "1");
google.setOnLoadCallback(initialize);
function initialize() {
$(".texttranslation").each(function() {
var translationarea = $(this)
var classes = translationarea.attr("class")
//classes is a list of classes applied, we just want the last one
var langportion = classes.split(" ")[1]
//now get the final code
var langcode = langportion.split("_")[1]
//now get my sibling text
var textSpan = $(this).siblings(".text")
var theText = $(textSpan).text()
google.language.translate(theText, langcode, "en", function(result) {
if (!result.error) {
translationarea.hide().html("Translation via Google: " + result.translation).fadeIn("slow");
}
})
})
}
</script>
I begin with my imports necessary for translation. I then ask jQuery to find all my texttranslation spans. For each one I examine the class attribute and use it as a simple text string. I can grab the language code from the second class which is always in the form of: lang_XX. Once I have the language, it is a simple matter then to fire off the translation request. I take the result, nicely attribute it to Google, and then place it under the Tweet. I added a fade in so I could look all sexy and web 2.0ish.
You can see a demo of this here: http://www.coldfusionjedi.com/demos/feb2520102/test.cfm Since I made the search string hard coded, you may or may not see a result with translations. I just confirmed it shows some now, but your results may vary. Here is the entire template - take a look at that before we move on to the real cool stuff:
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>
<script src="http://www.google.com/jsapi" type="text/javascript" ></script>
<script>
google.load("language", "1");
google.setOnLoadCallback(initialize);
function initialize() {
$(".texttranslation").each(function() {
var translationarea = $(this)
var classes = translationarea.attr("class")
//classes is a list of classes applied, we just want the last one
var langportion = classes.split(" ")[1]
//now get the final code
var langcode = langportion.split("_")[1]
//now get my sibling text
var textSpan = $(this).siblings(".text")
var theText = $(textSpan).text()
google.language.translate(theText, langcode, "en", function(result) {
if (!result.error) {
translationarea.hide().html("Translation via Google: " + result.translation).fadeIn("slow");
}
})
})
}
</script>
<style>
.texttranslation {
font-style: italic;
}
</style>
<cfset search = "vous OR robot">
<cfset results = []>
<cfhttp url="http://search.twitter.com/search.json?&rpp=20&q=#search#" result="result">
<cfif result.responseheader.status_code is "200">
<cfset content = result.fileContent.toString()>
<cfset data = deserializeJSON(content)>
<cfloop index="item" array="#data.results#">
<cfset arrayAppend(results, item)>
</cfloop>
<cfelse>
<cfdump var="#result#">
</cfif>
<cfloop index="r" array="#results#">
<cfoutput>
<p class="twitter_result">
From: #r.from_user#<br/>
Created: #r.created_at#<br/>
<span class="text">#r.text#</span>
<cfif r.iso_language_code neq "en">
<br/>
<span class="texttranslation lang_#r.iso_language_code#"></span>
</cfif>
</p>
</cfoutput>
</cfloop>
Ok, so that worked well, but I thought, I bet I could do this even cooler. I remembered that I had done a jQuery-based AIR application that performed simple searches. You can read about this here: Using Aptana Studio to build jQuery/AIR Applications (3) The blog entry describes how to use Aptana to build HTML based AIR applications. My final example was a simple search form that hits the Twitter search API. How hard would it be to modify it to do translations?
Not hard at all. The only real issue I ran into was the HTML/AIR security stuff. That always bugs me. Sim Bateman helped me a bit - but then I discovered something cool. Apparently Google supports REST based calls for translations. (One day I need to read all the Google API docs.) This worked perfectly within my little AIR application. You can download the AIR file below.
As a final FYI, did you know that ColdFusion Builder includes parts of Aptana? Did you know it includes all the bits you need to build your own HTML based AIR applications? Check it out today!