Yesterday in the IRC channel someone asked if there was a way to count the number of times each unique word appears in a string. While it was obvious that this could be done manually (see below), no one knew of a more elegant solution. Can anyone think of one? Here is the solution I used and it definitely falls into the "manual" (and probably slow) category.
First I made my string:
<cfsavecontent variable="string">
This is a paragraph with some text in it. Certain words will be repeated, and other words
will not be repeated. The question is though, how much can I write before I begin to sound
like a complete and utter idiot. Let's call that the "Paris Point". At the Paris Point, any
further words sound like gibberish and are completely worthless.
</cfsavecontent>
I then used some regex to get an array of words:
<cfset words = reMatch("[[:word:]]+", string)>
Next I created a structure:
<cfset wordCount = structNew()>
And then looped over the array and inserted the words into the structure:
<cfloop index="word" array="#words#">
<cfif structKeyExists(wordCount, word)>
<cfset wordCount[word]++>
<cfelse>
<cfset wordCount[word] = 1>
</cfif>
</cfloop>
Note that this will be inherently case-insenstive, which I think is a good thing. At this point we are done, but I added some display code as well:
<cfset sorted = structSort(wordCount, "numeric", "desc")>
<table border="1" width="400">
<tr>
<th width="50%">Word</th>
<th>Count</th>
</tr>
<cfloop index="word" array="#sorted#">
<cfoutput>
<tr>
<td>#word#</td>
<td>#wordCount[word]#</td>
</tr>
</cfoutput>
</cfloop>