About two nights ago I moved CFBloggers to my new host. Later that night I noticed a pretty severe performance issue at the database tier. I was reasonably confident I knew what the issue was, but I was also freaking tired as heck. Given that my body was going to overrule my brain in terms of what solution to apply, I decided to apply a quick bit of caching to the code in question.
Now - let me be clear. Caching to fix a performance problem is no fix. Certainly there are times when it is out of your control. Anything loaded via HTTP for example is a prime candidate for caching. But I want folks to be absolutely clear on the fact that what I did here was a practical, expedient patch, not a fix.
That being said, screw it, I was tired and it worked. 2 points for me.
So let's take a quick look at the method in question, getEntries.
<cfset var s = structNew()>
<cfset var q = ""> <cfquery name="q" datasource="#variables.dsn#">
select SQL_CALC_FOUND_ROWS e.url, e.title, e.posted, e.created, e.content, e.id, b.name as blog, e.categories, b.description as blogdescription, b.url as blogurl
from entries e, blogs b
where e.blogidfk = b.id
<cfif len(trim(arguments.search))>
and (1=0
<cfloop index="k" list="#arguments.search#">
or
(
e.title like <cfqueryparam cfsqltype="cf_sql_varchar" value="%#k#%">
or
e.content like <cfqueryparam cfsqltype="cf_sql_varchar" value="%#k#%">
or
e.categories like <cfqueryparam cfsqltype="cf_sql_varchar" value="%#k#%">
)
</cfloop>
)
</cfif>
<cfif structKeyExists(arguments,"dateafter")>
and e.created > <cfqueryparam cfsqltype="cf_sql_timestamp" value="#arguments.dateafter#">
</cfif> order by e.created desc
limit #arguments.start-1#,#arguments.total#
</cfquery>
<cfset s.entries = q> <cfquery name="q" datasource="#variables.dsn#">
select found_rows() as total
</cfquery>
<cfset s.total = q.total> <!--- log search --->
<cfif len(trim(arguments.search)) and arguments.log>
<cfset logSearch(arguments.search)>
</cfif> <cfreturn s>
</cffunction>
<cffunction name="getEntries" access="public" returnType="struct" output="false">
<cfargument name="start" type="numeric" required="false" default="1">
<cfargument name="total" type="numeric" required="false" default="10">
<cfargument name="search" type="string" required="false" default="">
<cfargument name="log" type="boolean" required="false" default="false">
<cfargument name="dateafter" type="date" required="false">
This is the core function that loads entries up for CFBloggers. It handles the basic data load, the paging, and retrieving searches or date based ranges. (Those of you who are OO nerds will note that the logSearch portion violates SRP. Mea culpa.) My patch took all of 2 minutes to write:
<cfset var s = structNew()>
<cfset var q = "">
<cfset var k = ""> <cfset var key = "#arguments.start##arguments.total##arguments.search#">
<cfif structKeyExists(arguments, "dateafter")>
<cfset key &= "_#arguments.dateafter#">
</cfif> <cfset var s = cacheGet(key)>
<cfif isNull(s)>
<cfquery name="q" datasource="#variables.dsn#">
select SQL_CALC_FOUND_ROWS e.url, e.title, e.posted, e.created, e.content, e.id, b.name as blog, e.categories, b.description as blogdescription, b.url as blogurl
from entries e, blogs b
where e.blogidfk = b.id
<cfif len(trim(arguments.search))>
and (1=0
<cfloop index="k" list="#arguments.search#">
or
(
e.title like <cfqueryparam cfsqltype="cf_sql_varchar" value="%#k#%">
or
e.content like <cfqueryparam cfsqltype="cf_sql_varchar" value="%#k#%">
or
e.categories like <cfqueryparam cfsqltype="cf_sql_varchar" value="%#k#%">
)
</cfloop>
)
</cfif>
<cfif structKeyExists(arguments,"dateafter")>
and e.created > <cfqueryparam cfsqltype="cf_sql_timestamp" value="#arguments.dateafter#">
</cfif> order by e.created desc
limit #arguments.start-1#,#arguments.total#
</cfquery>
<cfset s.entries = q> <cfquery name="q" datasource="#variables.dsn#">
select found_rows() as total
</cfquery> <cfset s.total = q.total>
<cfset cachePut(key, s)>
</cfif> <!--- log search --->
<cfif len(trim(arguments.search)) and arguments.log>
<cfset logSearch(arguments.search)>
</cfif> <cfreturn s>
</cffunction>
<cffunction name="getEntries" access="public" returnType="struct" output="false">
<cfargument name="start" type="numeric" required="false" default="1">
<cfargument name="total" type="numeric" required="false" default="10">
<cfargument name="search" type="string" required="false" default="">
<cfargument name="log" type="boolean" required="false" default="false">
<cfargument name="dateafter" type="date" required="false">
As you can see, I generate a cache key based on the relevant arguments (log doesn't count). I then try to fetch the query from the cache. If it doesn't exist, I run the query and stuff it in the query.
To handle updates, I simply modified my addEntryIfNew method. As you can guess, this handles adding new blog entries when they don't already exist. I won't post the entire method, but just note that I added this:
<cfset var ids = cacheGetAllIds()>
<cfset var id = "">
<cfloop index="id" array="#ids#">
<cfset cacheRemove(id, false)>
</cfloop>
Raise your hand if you are real happy that you can var scope anywhere now.
So this went live two nights ago. I knew it would be a while before I could update the database (and thanks go to Gary Funk for helping me with that) so I was curious to see how much the cache impacted my server. I wrote up a very quick template that gathered cache data and then logged it:
<cfset ids = cacheGetAllIds()>
<cfset data = {}>
<cfset data.totalItems = arrayLen(ids)>
<cfset data.totalSize = 0>
<cfloop index="id" array="#ids#">
<cfset md = cacheGetMetadata(id)>
<cfset data.totalSize += md.size>
</cfloop>
<cfset data.ids = ids>
<cfoutput>
#data.totalItems# items<br/>
size = #data.totalSize#<br/>
#arrayToList(data.ids)#
</cfoutput>
<cflog file="cfbloggers_cache" text="#serializeJSON(data)#">
Nothing fancy here. I get the IDs, count them and count the size. I serialize the whole thing to JSON and then cflog it. I then added a scheduled task to run this report every 10 minutes. Last night I wrote a quick log parser that reads in the data and creates a query from the serialized data.
<cfset logFile = "/Users/ray/Downloads/cfbloggers_cache.log">
<cfset dataSet = 0>
<cfloop index="line" file="#logFile#">
<cfset dataSet++>
<cfset line = listDeleteAt(line, 1)>
<cfset line = listDeleteAt(line, 1)>
<cfset date = listGetAt(line, 1)>
<cfset date = mid(date, 2, len(date)-2)>
<cfset line = listDeleteAt(line, 1)>
<cfset time = listGetAt(line, 1)>
<cfset time = mid(time, 2, len(time)-2)>
<cfset line = listDeleteAt(line, 1)>
<cfset line = listDeleteAt(line, 1)>
<cfset line = mid(line, 2, len(line)-2)>
<cfset line = replace(line, '""', '"', "all")>
<cfset d = deserializeJSON(line)>
<cfset queryAddRow(cachedata)>
<cfset querySetCell(cachedata, "items", d.totalitems)>
<cfset querySetCell(cachedata, "size", d.totalsize)>
<cfset querySetCell(cachedata, "time", date & " " & time)>
</cfloop>
<cfset cachedata = queryNew("items,size,time", "integer,bigint,timestamp")>
I then took this data and applied it to a chart. Now here is where things get interesting. It is a rather large chart, so please click on it to view the original version.
The chart begins at midnight on the 7th and goes till 11am or so today. The up and down movement represents the cache growing until new content comes in. What's odd is that there is steady growth on the morning of the 7th, then it "calms" down, then at night it begins to grow and not fall, which must represent the lack of people blogging and adding new entries. This morning it seems to be following the afternoon pattern from yesterday. The only change I'm aware of is the database fix, which I did at lunch time yesterday. But while that sped things up, I didn't clear the cache myself, so I'm not sure why the drop would happen then.
Anyway, I thought folks might find this interesting. I'll run another report on Monday and share the report then.