August 20, 2009 (This post is more than 2 years old.)

Simple ColdFusion 9 ORM/Solr Example

coldfusion

Last night I decided to whip together a simple example of how to add Solr search indexing to an application. Luckily, for the most part, this is the exact same process we've been using for years now with Verity. I know many people avoided Verity due to the document size limits so with that in mind, I thought a simple ColdFusion 9 example would help introduce the feature. To start off with, let me show you a simple application that has no search capability at all. This will be the first draft application that I'll modify to add Solr support.

My application is a Press Release viewer. The public page consists of a list of press releases. You click on a press release to view the details. The admin folder (and for this proof of concept it won't have any security) allows for basic CRUD operations. I won't show most of the code as it's rather boring, but I'll demonstrate my Application.cfc and the model layer. First, the Application.cfc file:


component {
this.name = "pressreleases";
this.ormenabled = true;
this.datasource=this.name;
this.ormsettings = {
dialect="MySQL",
dbcreate="update",
eventhandling="true"
};
this.mappings["/model"] = getDirectoryFromPath(getCurrentTemplatePath()) & "model";
public boolean function onApplicationStart() {
application.prService = new model.prService();
return true;
}

public boolean function onRequestStart(string page) { if(structKeyExists(url, "init")) { ormReload(); applicationStop(); location('index.cfm?reloaded=true'); } return true; } }

Nothing too fancy here - I've enabled ORM, allowed for easy restarts, and created a grand total of one CFC in the application scope, the prService. The prService is simply a component to abstract access to my press release model. The press release entity is just:


component persistent="true" {
property name="id" generator="native" sqltype="integer" fieldtype="id";
property name="title" ormtype="string";
property name="author" ormtype="string";
property name="body" ormtype="text";
property name="published" ormtype="date";

}

And the service provides an abstraction layer to it:


component {
pubic function deletePressRelease(id) {
entityDelete(getPressRelease(id));
ormFlush();
}
public function getPressRelease(id) {
if(id == "") return new pressrelease();
else return entityLoad("pressrelease", id, true);
}
public function getPressReleases() {
return entityLoad("pressrelease");
}
public function getReleasedPressReleases() {
return ormExecuteQuery("from pressrelease where published < ? order by published desc", [now()]);
}

public function savePressRelease(id,string title,string author,date published,string body) { var pr = getPressRelease(id); pr.setTitle(title); pr.setAuthor(author); pr.setPublished(published); pr.setBody(body); entitySave(pr); } }

I assume most of this makes sense. Note that I have bot ha getPressReleases function as well as a getReleasedPressReeleases function. The later handles the public view and only gets press releases where the published date is in the past. Notice that savePressRelease is kind of nice - it just plain works whether you have a new press release or an existing one. Also make note of delete. In order to handle calling a delete operation followed by a list, I force a flush on the ORM stuff. If I didn't, the deleted item would show in the list during the same request.

You can download all of this code at the bottom, and again, I don't want to waste too much time on basic list/edit forms. What I want to talk about instead is the process of enabling Solr searching support for this application.

When you work with Solr (and Verity as well), you work with an index of your data. This index, much like an index in a book, represents all the data that you want to be searchable. However, and this is the critical point, it is your responsibility to keep the index up to date. That means every time you add, edit, or delete content, you have to update the index. The maintenance aspect then is typically the most complex part of the process. Searching really just comes down to one tag.

I normally create a "Ground Zero" type script that handles creating my collection and index from scratch. (Think of the collection just as the folder or name of the index.) This is useful to run during testing or if you encounter a bug where your index gets out of data. I created the following script for that purpose:


<cfcollection action="list" name="collections" engine="solr">
<!--- collection check --->
<cfif not listFindNoCase(valueList(collections.name), application.collection)>
<cfoutput>
<p>
Need to create collection #application.collection#.
</p>
</cfoutput>
<cfcollection action="create" collection="#application.collection#" engine="solr" path="#application.collection#">
</cfif>
<!--- nuke old data --->
<cfindex collection="#application.collection#" action="purge">
<!--- get data --->
<cfset prs = application.prService.getPressReleases()>
<cfoutput><p>Total of #arraylen(prs)# press releases.</p></cfoutput>
<!--- convert to a query --->
<cfset data = entityToQuery(prs)>
<!--- add to collection --->
<cfindex collection="#application.collection#" action="update" body="body,title" custom1="author" title="title" key="id" query="data">

<p>Done.</p>

I begin by getting a list of collections. The ColdFusion 9 docs say that if you leave the engine attribute off the cfcollection tag it will return everything. I did not see that. I file a bug on it. But for now, I've just added the engine attribute. This returns a query of collections. If I don't find my collection in there (I created an application variable to store the name) then I create one. In theory, this will only happen one time.

Next I remove all data from the collection with the purge. Again, I'm thinking that this script would be useful both for a first time seeding of the index as well as a 'recovery' type action.

Once we have an empty index, I get all of my press releases and convert it to a query with the entityToQuery function.

Lastly, I simply pass that query to the cfindex tag. Now, here is an important part. When you pass data into the index, you get to the decide what gets stored in the body and what, if anything, gets stored in the 4 custom fields. I decided that the body and title made sense for the searchable information. I repeated title again for the title attribute. This will let me get the title in search results. For the custom field I used the author. Again, this was totally up to me and what made sense for my application.

Alright, so at this point we can run the script to create our collection and populate the index. I then switched gears and worked on the front end. I create a new search template to handle that:


<cfparam name="url.search" default="">
<cfparam name="form.search" default="#url.search#">
<cfset form.search = trim(form.search)>
<form action="search.cfm" method="post">
<cfoutput><input type="text" name="search" value="#form.search#"> <input type="submit" value="Search"></cfoutput>
</form>
<cfif len(form.search)>
<cfsearch collection="#application.collection#" criteria="#form.search#" name="results" status="r" suggestions="always" contextPassages="2">
<cfif results.recordCount>
<cfoutput>
<p>There were #results.recordCount# result(s).</p>
<cfloop query="results">
<p>
<a href="detail.cfm?id=#key#">#title#</a><br/>
#context#
</p>
</cfloop>
</cfoutput>
<cfelse>
<p>
Sorry, but there were no results.
<!--- trim is in relation to bug 79509 --->
<cfif len(trim(r.suggestedQuery))>
<cfoutput>Try a search for <a href="search.cfm?search=#urlEncodedFormat(r.suggestedQuery)#">#r.suggestedQuery#</a>.</cfoutput>
</cfif>
</p>
</cfif>

</cfif>

Going line by line, we begin with some simple parameterizing of a search variable, along with a basic form. If the user actually searched for something, we use cfsearch. As you can see, it works pretty simply. Pass in a criteria and a name for the results and you are done. The status attribute is not necessary but provides some cool functionality I'll describe in a bit.

If we have any results, I simply loop over them like any other query. The context is created by Solr based on your matches. So if you searched for enlightenment (don't we all), then the context will show you where it was found in the data.

The cool part is the else block. Solr (and Verity before it) provided a nice feature for searches called suggestions. Let's say a user wanted to search for Dharma but accidentally entered Dhrma. In some cases, the Solr engine can recognize the typo and will actually return a suggested query: Dharma. Pretty cool, right? Please note that the trim in there is due to another bug I found. In cases where Solr could not find a suggestion, it returned a single space character. I'm sure this will be fixed for the final release. If we do get a suggested query then we simply provide a link to allow the user to try that instead.

So far so good. Now let's talk about keeping the index up to date. If you remember, I had built a simple service component, prService, to handle all CRUD operations for my data. Because I did that, it was rather simple to handle the changes necessary for my index. First, my Application.cfc onApplicationStart was modified to support passing in the collection name:


public boolean function onApplicationStart() {
	application.collection = "pressreleases";
	application.prService = new model.prService(application.collection);
	return true;
}

And then prService was modified to support it. Unfortunately, there are no script based alternatives for Solr/Verity support. To be honest, it would probably be trivial to create such a component. (In case you didn't know, the ColdFusion 9 script based support for mail, and other things, was done this way.) I ended up simply rewriting my component into tags:


<cfcomponent output="false">
<cffunction name="init" output="false">
<cfargument name="collection">
<cfset variables.collection = arguments.collection>
</cffunction>
<cffunction name="deletePressRelease" output="false">
<cfargument name="id">
<cfset entityDelete(getPressRelease(id))>
<cfset ormFlush()>
<!--- update collection --->
<cfindex collection="#variables.collection#" action="delete" key="#id#" type="custom">
</cffunction>
<cffunction name="getPressRelease" output="false">
<cfargument name="id">
<cfif id is "">
<cfreturn new pressrelease()>
<cfelse>
<cfreturn entityLoad("pressrelease", id, true)>
</cfif>
</cffunction>
<cffunction name="getPressReleases" output="false">
<cfreturn entityLoad("pressrelease")>
</cffunction>
<cffunction name="getReleasedPressReleases" output="false">
<cfreturn ormExecuteQuery("from pressrelease where published < ? order by published desc", [now()])>
</cffunction>
<cffunction name="savePressRelease" output="false">
<cfargument name="id">
<cfargument name="title">
<cfargument name="author">
<cfargument name="published">
<cfargument name="body">
<cfset var pr = getPressRelease(id)>
<cfset pr.setTitle(title)>
<cfset pr.setAuthor(author)>
<cfset pr.setPublished(published)>
<cfset pr.setBody(body)>
<cfset entitySave(pr)>
<!--- update collection --->
<cfindex collection="#variables.collection#" action="update" key="#pr.getId()#" body="#pr.getBody()#,#pr.getTitle()#" title="#pr.getTitle()#" custom1="#pr.getAuthor()#" type="custom">
</cffunction>

</cfcomponent>

If we ignore the tags, the only changes are the cfindex tags in deletePressRelease and savePressRelease. In both cases it isn't too difficult. The key attribute refers to the primary key in the index. We used the database ID record so it's what we use when updating/deleting. The update action works for both additions and updates, so that is pretty simple as well.

Unfortunately, I ran into an issue with deletes. Delete operations are 100% broken in the current release of ColdFusion 9, at least on the Mac (and I bet it works ok in Verity). Keep this in mind as you play with the demo code. I've been told this is fixed already.

So what do folks think? Will you use this when you upgrade to ColdFusion 9? Also, have you notice the slight logic bug with search? I won't say what it was - but I'll tackle it in the next post.

Download attached file.

Support this Content!

If you like this content, please consider supporting me. You can become a Patron, visit my Amazon wishlist, or buy me a coffee! Any support helps!

Want to get a copy of every new post? Use the form below to sign up for my newsletter.

Archived Comments

Comment 1 by Art posted on 8/20/2009 at 10:27 PM

Great example, thanks. How would you modify CF8WACK Vol-2 Pg-463 Listing 39.13 - "Combining Verity Searches with SQL Queries on the Fly", if you still need to simultaneously get additional data out of the model when you run your cfsearch?

Isn't a "slight logic bug" illogical?

Comment 2 by Shannon Hicks posted on 8/20/2009 at 10:59 PM

Now show an example of how to index more complex content. Say your pressreleases object did a many-to-many join to a contacts table, and a one-to-one join on author. How can you index that?

Comment 3 by Raymond Camden posted on 8/20/2009 at 11:05 PM

@Art: The key result from the search is the same as the PK in the database, so I could use the CF9 entity functions if I needed "more" then I could index. Don't forget we have both 4 custom fields as well as categorization we can use as well (which could be - kinda - two more fields).

@Shannon: To be honest, that isn't too exciting. I was able to skip quite a bit with one call: entityToQuery, but if I couldn't do that, then I'd simply loop over and make a query by hand. I _could_ do N cfindex calls as well, but that tends to be slow. If folks do feel a more complex example would be warranted, then I can definitely consider it for the next post.

So does anyone see the security error with the search?

Comment 4 by Daniel Budde posted on 8/20/2009 at 11:26 PM

Personnally I started avoiding using Verity because of the resource hog it became in CF7+. I tried seperating out the verity server from the CF server, but I was never able to get it to function well. Do you know if Solr performs any better than Verity? I am currently looking for performance information, but if anyone knows anything, I would be glad to hear it.

Comment 5 by Raymond Camden posted on 8/20/2009 at 11:32 PM

I've not done any testing yet, sorry. I can say I did a "large index" test with Seeker, my Lucene wrapper for CF8. It was able to search a multimillion index pretty darn fast (I think it may have been 20 million even - but not sure).

Comment 6 by Rick Mason posted on 8/21/2009 at 1:07 AM

Thanks for this post. I think SOLR and ORM are the two things I'm most looking forward to in CF9. We already have plans to upgrade our servers.

Comment 7 by david buhler posted on 8/21/2009 at 7:25 PM

"Unfortunately, there are no script based alternatives for Solr/Verity support."

When I saw Ben Forta speak with Adam Lehman at NYCFUG, it was my understanding that every tag but 1 (I forget which one) would be available in CFScript syntax.

Comment 8 by Raymond Camden posted on 8/21/2009 at 7:29 PM

Unfortunately this is not true. Some new things are being added - for example, cfdirectory support was fixed post public beta - but as far as I know, it will NOT be 100%. I've asked Adobe to ensure they carefully document what can and cannot be done in CFS.

Comment 9 by david buhler posted on 8/21/2009 at 7:45 PM

Boo. Then again, I applaud their willingness to keep pushing in new features in smaller releases, even if it's not 100%.

Comment 10 by pat branely posted on 8/22/2009 at 7:09 AM

Hi Ray

how well does this work with verity/k2 ?

from my experience there is a massive performance hit when calling CFindex using verity and updating only 1 record. since moving from CF6 to cf7/8 we have had to re-structure our apps to CFINDEX via a schedule and pass a query with a large number of records to cfindex. ie. cfindex with 1 record = 10 seconds. cfidnex passing 100 records = 10 seconds.

Does CF9 bring back the the vdk style of updating your index on save ? from your example - It looks like it.

Comment 11 by Raymond Camden posted on 8/22/2009 at 6:07 PM

Wow, I've never seen Verity that slow. I've certainly seen it go slow if you did a lot of singular updates instead of a large query at once, but for atomic operations, it always went fast for me. To be fair, it has been a while since I used Verity. I did it last for CFCookbook, but I never saw slowness when editing content.

To your last question, I'm not sure you mean by 'vdk style of updating' - but - as far as I know, this code should just plain "work" if you switched from Solr to Verity - in fact, I'm willing to bet the delete bug doesn't exist on the Verity site.

Comment 12 by pat branely posted on 8/23/2009 at 12:43 PM

VDK in 6.1 allowed atomic updates - ie save a record update the index with the changes of that record. it might take a little extra time but nothing noticeable on save

K2 in 7/8 would take seconds to update just 1 record in the index. it was so slow for us we had to deferr indexing out of the save operation on records and into a schedule task.

anyways im excited to see this solr example in CF9

Comment 13 by AJ Mercer posted on 10/15/2009 at 11:23 AM

Hi Ray,

Have you tried to use custom1 in cfsearch with solr?
<cfsearch collection="#arguments.collection#" type="simple" criteria="CF_CUSTOM2 <matches> #newResId#" name="qTemp" />
Throws an error for me.

CF9 doc say custom1 .. 4 are for verity only
http://help.adobe.com/en_US...

Comment 14 by Raymond Camden posted on 10/15/2009 at 3:47 PM

I _thought_ that I had and that it worked. What error do you get?

Comment 15 by AJ Mercer posted on 10/15/2009 at 4:36 PM

Looks like the problem is actually a corrupt PDF.

However, one corrupt PDF through an error,
but another one just hangs the systems???

Comment 16 by Raymond Camden posted on 10/15/2009 at 4:42 PM

So it sounds like you are saying 2 things her.e

1) A bad PDF that gets indexed causes the entire collection to stop working. Right?

2) It also sounds like another bad PDF caused the server to hang. Right?

Please confirm as to me it sounds like 2 separate issues.

Comment 17 by AJ Mercer posted on 10/15/2009 at 4:56 PM

The process does a cfindex update and then a cfsearch. The collection is not getting corrupted.

There are (at least) two corrupt PDF files to be added to the collection. One throws and error (which can be handled with try/catch). The other is hanging CF - no error thrown, not even a timeout.

I took out cfsearch as I thought that was causing the error - but still hangs.

I added cfpdf getinfo (removing cfindex) and that hangs the system on the same file.

I will do more testing tomorrow when I am back in the office and let you know what I discover.

Comment 18 by Raymond Camden posted on 10/15/2009 at 5:49 PM

Where is the error - is it in the cfindex or the cfsearch? You said you took out the cfsearch but it still hanged - but is that for the "Hang" PDF or the "Error" PDF? (It seems like one pdf causes an error, one a hang.)

Comment 19 by AJ Mercer posted on 10/16/2009 at 7:56 AM

added a bug
http://cfbugs.adobe.com/cfb...

but not sure how to supply my test code and files

Comment 20 by Phillip Senn posted on 11/10/2009 at 6:55 AM

I'm starting to learn the new syntax.
So, instead of:

<cfset Application.prService = CreateObject("component","Model.prService")>
we now say:
this.mappings["/model"] = getDirectoryFromPath(getCurrentTemplatePath()) & "model";
application.prService = new model.prService();
?

Comment 21 by Raymond Camden posted on 11/10/2009 at 7:27 AM

Not exactly. Typically a "service" component isn't an entity. An entity normally represents one row of data, or one "instance", so one person. A service component typically works with data as a whole. So my userService, for example, is my main API to get users. The userService may return user entities.

Make sense?

To be clear, CF ORM doesn't really DEMAND you follow any particular type of way of coding. So don't take what I say as the One True Way.

Comment 22 by Henry Ho posted on 6/3/2010 at 6:13 AM

custom1, custom2, custom3, custom4 work with Solr?? The documentation said they're only for Verity <MATCHES> operator. How to use with customX with Solr?

Thank you!

Comment 23 by Raymond Camden posted on 6/3/2010 at 3:17 PM

You can use them to store additional data. This can be used when displaying the results.

Comment 24 by Raymond Camden posted on 6/3/2010 at 3:24 PM

You can search against custom fields using a colon operator:

custom1:NNNNN

Comment 25 by Henry Ho posted on 6/3/2010 at 8:58 PM

awesome, thanks!

Comment 26 by Fabio posted on 8/4/2010 at 1:04 PM

A little dummy question about coldfusion + solr search engine.
I am indexing hundreds of .htm docs with a 'question' in the <title> and the 'answer' in the <body>.
1) Will Solr prioritize the title in my cfsearch? I mean, is the title more important for the engine, isn't it?
2) And how can I add one or more categories/tags to my doc in the index, e.g. based on the argument? I mean, how can I read a html tag (ex. <h1>) and put its content into an index field? is it possible?

Comment 27 by Raymond Camden posted on 8/4/2010 at 4:11 PM

1) As far as I know, Solr does get some context into what it indexes, so it should treat TITLE as more important. To be honest, I don't have proof of this.

2) You can use categorization when you index data. If you are indexing files, it means you have to switch to a more manual process, but you can do it. The cfindex tag supports the category and categorytree arguments. You also have 4 custom fields.

Comment 28 by Fabio posted on 8/4/2010 at 4:39 PM

Thanks a lot for your ultra-fast answer Raymond.
I knew that categorization would let me achieve my goal but... no reference on how to do it actually. I mean how can I assign a document to the 'red' category and another to the 'blue' one?
To be honest, here's what I'm tryiing to realize:
my .htm doc contains <title>What color are your eyes?</title><body>They are blue.</body> . I want to search "are your eyes blue?" or "tell me your eyes' color" and get that as a result. Solr doesn't seem to get the more relevant word 'eyes'.. It highlights 'what color' or 'me your' .. wtf?!? Also, I am working in italian language (cfcollection cfindex cfsearch using language='italian').
Indeed, for categorizing, do I have to put some category tag (e.g. <h1>Blue</h1>) and tell solr CUSTOM1="h1" ?
Feel free to send me an e-mail if you can help me, please.
Thanks in advance!

Comment 29 by Raymond Camden posted on 8/4/2010 at 5:25 PM

The docs do discuss this:

http://help.adobe.com/en_US...

However, in terms of file based indexing the category/categoryTree you use is assigned to every thing you index. In order to apply a unique value to each file, you would need to a) decide on your business logic (ie, WHAT cat goes with what file) and b) index one file at a time.

Comment 30 by Fernando posted on 8/25/2010 at 9:21 PM

You mentioned in a previous post that Solr would "supposedly" treat TITLE as more important in an HTML doc collection. What about if I want to give preference to certain DB table fields? For example, I create a query to index with title, description, actors, and themes fields, but I want the title to be the most impotant (and then description, actors, and themes in that order or importance, if possible). Is this something that Solr does implicitly by me giving the fields the correct "mapping names" that Solr understands so as to be able to give preference? Or is there some manual way to do this via some configuration attributes or some xml file somewhere?

Thank you.

Comment 31 by Raymond Camden posted on 8/26/2010 at 3:45 PM

It looks like you can "boost"

http://wiki.apache.org/solr...

Comment 32 by Fernando posted on 8/30/2010 at 8:44 PM

That did the trick. Thank you! Next time I'll be a little more diligent and try to look for the actual product's docs as opposed to limiting my search to Adobe docs. :D
By using the "custom" attributes for my DB fields (custom1="title" custom2="description" etc), I was able to "boost" title in the criteria as follows:
criteria="custom1:#searchStr#^2 custom2:#searchStr#"...

Thank you once again!

Comment 33 by Raymond Camden posted on 8/31/2010 at 2:30 AM

No problem. Solr is something I really need to spend more time with. It is incredibly powerful and I'm overjoyed (can I say that?) with Adobe adding it to ColdFusion.

Comment 34 by geekatwork posted on 9/2/2010 at 5:18 AM

Has any one had any luck in converting <CONTAINS> code from Verity to Solr. Whenever I search for a substring in a string it returns all matching parts of the substring. e.g. Looking for "hello world" in "always use hello world the first time" returns matches for hello and for world.
I can get it to work by using +custom1:(+hello +world) but that seems wrong, I would have thought +custom1:"hello world" would work.

Comment 35 by geekatwork posted on 9/2/2010 at 6:02 AM

Re: converting <CONTAINS> code from Verity to Solr
Use custom1:"hello world"~1000000 , I nearly had it but I'd left the 1000000 (slop?) of of it when testing.

Comment 36 by Fernando posted on 9/15/2010 at 8:41 PM

Has anyone had any issues when modifying schema.xml (in either the collection folder or the actual solr template folder) and then restarting the solr service? When I do it, none of the collections show up and I get an error when trying to add new ones (The logs haven't proved to be very useful so far...). I then have to manually remove the custom collections folders and their respective entries in solr.xml, and replace the schema.xml with the original. Once I do this and I restart the solr service, I can then add new collections again.
This obviously will not work for a production environment since anytime the server restarts, searching will not work in any applications that have implemented it.

Comment 37 by Fernando posted on 9/28/2010 at 1:49 AM

Anybody have a similar issue?

Comment 38 by Fernando posted on 9/28/2010 at 8:24 PM

It seems one cannot have multiple tokenizers per fieldType in the schema.xml

Comment 39 by sean hogge posted on 10/15/2010 at 1:33 AM

I have a couple of strange things going on with Solr that I can't find a solution for anywhere. I'm convinced it's because I'm missing something obvious, so hopefully it's a simple matter to point out where I'm wrong.

ColdFusion 9.0.1 Standard on Linux - web root is /www, but web sites are hosted in /www/sitename.com.

When I index /www/sitename.com, and search the index, I get hits from /www/CFIDE. I have no symlink, only the mapping in the CF administrator.

My "common sense" tells me Solr shouldn't index anything not physically present in the recursed directories. Is there some known setting that I need to flip to prevent this behavior, or am I completely misunderstanding the issue here? I swear I'm not a complete idiot. Mostly.

Comment 40 by sean hogge posted on 10/15/2010 at 1:43 AM

Well, it looks like purging and re-indexing seems to have remedied that issue. And it appears to be non-reproducible.

But since I've gone and revived this old thread, I'd love to hear any recommendations for a good solution to search CFML content. Is there a site spider plugin that might integrate with Solr somehow?

Comment 41 by Mike posted on 10/15/2010 at 11:10 PM

Can you please show me the whole statement for the search?
I need to boost scoring for title and by using this snippet:
criteria="custom1:#searchStr#^2 custom2:#searchStr#".
(from above) the search doesn't return anything.

This is my statement:
<cfindex collection="myCol" action="update" body="title,description" custom1="description" title="title" key="ID" query="myQ">
<cfsearch collection="myCol" criteria="title:#searchStr#^10 custom1:#searchStr#^5" name="searchResult" >

Thanks
Mike

Comment 42 by Raymond Camden posted on 10/18/2010 at 3:19 PM

If you only boost title, does it work?

Comment 43 by Mike posted on 10/18/2010 at 5:12 PM

No it doesn't. Not sure what's going on but whenever I'm adding title: to the search criteria it seems that it's using the whole thing for the search and as a result it doesn't find anything.
I searched all over the place and can't find a reason for it.

Very strange.
Can somebody test and see if they can use title: in the search criteria to boost the score?

Thanks
Mike

Comment 44 by Eduardo Aben Athar posted on 10/19/2010 at 5:49 PM

Raymond Speaks!
Have you tested with the Solr search index with more than 10,000 accessions of contents and 5000 hits a day in it?

Comment 45 by Raymond Camden posted on 10/19/2010 at 5:58 PM

Nope, I do not have anything in production yet with it. I'm considering perhaps using it at CFLib as a good test.

I'm presenting on SOLR at RIAUnleashed, and will have a small sample app then. I can try running JMeter against the site and see how it holds up. But to be honest, 10K hits in a day isn't a whole heck of a lot.

Comment 46 by David Arnold posted on 10/19/2010 at 10:42 PM

I have a CF9 site that I am trying to use SOLR on to index PDF files. I am able to do cfm and txt files, but not PDF. Any reason as to why. I also built a Verity collection, which does find the PDFs.

Thanks,

Comment 47 by Raymond Camden posted on 10/20/2010 at 7:57 PM

David, there were bugs in PDF indexing fixed in 901/CHF. So ensure you have BOTH 901 installed AND the cumulative hot fix.

Comment 48 by David Arnold posted on 10/20/2010 at 8:26 PM

I applied the HF and reindexed. Still no PDFs.
Destroyed the collection and rebuilt. Reindexed. Still no PDFs.

Thanks for the help.
David

Comment 49 by Raymond Camden posted on 10/21/2010 at 4:19 PM

Can you say _how_ you did the index? Was it via the cf admin or was it via cfindex.

Comment 50 by David Arnold posted on 10/21/2010 at 6:39 PM

Sure.

The collection was created from the website via the CF ADMIN pages.

The index was created from a webpage using CFINDEX. I will post the code below, just in case...

I have contact Adobe support in regards to this. Waiting to hear from them. The server is a new install of W2K3, CF9 with the update and CHF applied. No other application are on the server, other than IIS.

Comment 51 by Raymond Camden posted on 10/21/2010 at 6:42 PM

Ok, so to be clear, you run this and the PDFs are not added. Add a status attribute to your cfindex tag. This returns a structure that tells you how much stuff was added, updated, and deleted. I'd also check your extensions value. Maybe specifically use *.pdf just to see if it makes a difference.

Comment 52 by David Arnold posted on 10/21/2010 at 8:07 PM

Okay.

Thanks to your suggestions, I thought it was very weird that it was not indexing the PDFs that Verity has previously indexed (different collection). I threw in the status attribute as you suggested. I was able to see the txt and cfm files that I had also added to that directory to be indexed. Those all indexed just fine, but none of the PDFs in same directory.

I then decided to put in some other documents (XLS, BMP, DOC, etc...) into the directory to see if the indexing was going to work. It did!

I then found some "other" PDFs, put them in the directory, and they indexed!

Background: The PDFs are created by a system called DCS, which we use for invoice printing for our ERP system. It is odd to me that Verity can index these PDFs while Solr cannot. I took a look at the PDF, it is compatible with 5.x and greater. There must be something with these PDFs that Solr does not like.

Comment 53 by Raymond Camden posted on 10/21/2010 at 8:11 PM

Odd -well - at least you got part of the way. :) I'd file a bug w/ Adobe on your DCS PDFs.

Also - you may want to try using CFPDF to read the text from them. If you can, you can index them manually.

Comment 54 by David Arnold posted on 10/21/2010 at 8:51 PM

Been contacted by Adobe in regards to this and providing examples to them for further review. I will let you know how it progresses.

Comment 55 by David Arnold posted on 10/28/2010 at 10:10 PM

I have a bug logged with Adobe.

It seems that a PDF is not a PDF. Adobe called the PDF corrupt, even though it can be indexed by Verity, open/modified by Acrobat, and modified by CFPDF.

We are going to create a PDF from a PDF (using the CFPDF tag) and then letting Solr index that PDF. We tested the process and it works!

Thanks to all that helped in this.
David

Comment 56 by Jerry posted on 4/13/2011 at 7:18 PM

Could you address how to set up file collections when the target files reside on a server other than my webserver? I have installed SOLR service from the CF9 disk on the target FileServer and can see it running there but I cannot get files added to a SOLR collection in the Administrator on the webserver. I think I'm missing something really obvious here. I've looked for this answer everywhere, including your collaborative book(s) on CF9. Your thoughts would be most appreciated. CF9, W2003, IIS
thanks

Comment 57 by Shannon Hicks posted on 4/13/2011 at 7:27 PM

Jerry -

I'm sure the problem is you have it locked down so solr is only accessible to localhost. You need to open it up to allow access from your web server. You might need to do this both at the solr level, and at the machine or network firewall level, depending on your setup.

For solr, you basically have to do the opposite of this tech article, either allowing from any IP, or just from your CF server's IP: http://kb2.adobe.com/cps/80...

Comment 58 by Jerry posted on 4/14/2011 at 2:19 AM

Shannon, thanks for the idea. I determined that those files (on both servers) are in the default configuration --that of Not locked down. I tried mixing it up as best I could, as well. I think my issue is much more fundamental. I gotta start over from the beginning.
Thanks again

Comment 59 by conor posted on 5/23/2011 at 2:51 PM

Hi,

I am new to coldfusion and I am reading up on how to use solr. I am reading throught your solr example on http://www.coldfusionjedi.c... and I have downloaded the zip. However, when I launched it locally, I get the error: 'Datasource pressreleases could not be found.' Casn you tell me what I must do for this?

Thanks for you help!
Conor

Comment 60 by Raymond Camden posted on 5/23/2011 at 3:52 PM

Create a datasource called pressreleases. Point it to a MySQL db (an empty one). Any dbtype should work actually.

Comment 61 by Mike posted on 8/30/2011 at 7:28 PM

Can somebody show me how to search in the custom fields.
Using
<cfsearch collection="mycollection" criteria="#searchString# custom2:#searchString#" name="q" status="r" suggestions="always" contextPassages="0">

I get a big fat error:

here was a problem while attempting to perform a search.
Error executing query : orgapachelucenequeryParserParseException_Cannot_parse_custom_Encountered_EOF_at_line_1_column_7__Was_expecting_one_of____________________QUOTED_______TERM_______PREFIXTERM_______WILDTERM_____________________NUMBER_______

Thanks
Mike

Comment 62 by Raymond Camden posted on 8/31/2011 at 12:25 AM

Do you get it for all values of searchString?

Comment 63 by Mike posted on 8/31/2011 at 3:01 AM

Yes, and basically the value I'm searching exists in the query.
I thought I have the syntax wrong, but didn't find anywhere a different way of doing it.

BTW I'm on CF 9.01.

Thanks Ray for your quick response

Comment 64 by Raymond Camden posted on 9/1/2011 at 12:49 AM

That's the right syntax for searching against a custom field so I'm not sure what to suggest. If you want to ping me off the blog I can maybe dig a bit deeper.

Comment 65 by Mike posted on 9/1/2011 at 5:45 PM

I kind of fix it by adding the custom filed to the body.

Not the ideal solution but it's working. I will try on a different server and let you know.

eagerly waiting for CF10. I hope solr will finally have all the features enabled and working.

Thanks again
Mike

Comment 66 by Raymond Camden posted on 9/1/2011 at 6:30 PM

Solr does have all the features. CFSEARCH does not. :) Remember you can hit Solr directly via HTTP and get the results back. CFSEARCH as a wrapper can't cover every feature.

I can say that in my CF/Solr preso next week I'm revealing one of the new things in CF10. You should attend. :)

Comment 67 by Mike posted on 9/1/2011 at 9:51 PM

you are right I meant cfsearch not solr.

Is it on the cfmeetup or something else?
Please provide URL if it's available.

Thanks

Comment 68 by Raymond Camden posted on 9/1/2011 at 10:14 PM

Here are the details.

http://www.adobe.com/cfusio...

Comment 69 by Mike posted on 9/1/2011 at 10:32 PM

Ray, will there be a recording of this event? Because it's mid day, I don't think I can attend but I will very much like to see it.

Also, I really hope that some advanced features are shown including the ability to access solr directly as you mention in the previous message.

Thanks

Comment 70 by Raymond Camden posted on 9/1/2011 at 10:39 PM

There will be a recording, but I'm not going that deep. I just mention it's possible. Remember there is a web based interface to Solr. By default you can hit it here - http://localhost:8983/solr/

Basically you just need to make HTTP requests with the right url parameters. Check the Solr docs for information on that.

Comment 71 by sdtacoma posted on 9/2/2011 at 2:14 AM

Is there a way to cfdump or view the contents of a collection some way? I am trying to update a collection and want to see if my update is in the collection or not. Searching the collection isn't returning any hits.

Comment 72 by Raymond Camden posted on 9/2/2011 at 5:11 AM

If you search for nothing, it will return everything.

If you want to search for a specific item, remember you can search against the key field. I'm not having luck now using a file based key, but I'm pretty sure it _does_ work.

Comment 73 by Mike posted on 9/8/2011 at 10:08 PM

Ray,

Did you have a chance to see if ranking works in CF implementation.
See my previous post (41).

Thx
Mike

Comment 74 by Raymond Camden posted on 9/8/2011 at 11:35 PM

It seems to work for me. I tried this search against a collection created on cfdocs:

cffeed title:cfthread

Which means: cffeed in the body or cfthread in the title.

I then did

cffeed title:cfthread^100

And the result with cfthread in the title popped to the top rank wise.

Comment 75 by Mike posted on 9/8/2011 at 11:50 PM

Can you please show the whole index and search statement?
Maybe I'm doing something wrong, even though if I would have it wrong (from a syntax point of view) I would expect an error, But I get no results.
If I do a normal search without ranking I get results

Thx for the quick replay
Mike

Comment 76 by Raymond Camden posted on 9/8/2011 at 11:58 PM

I did post the entire search statement above. The index is an index of the CFDocs that ship with ColdFusion. I created that in the admin.

Comment 77 by Mike posted on 11/26/2011 at 9:08 PM

Hi Ray,

Unfortunately it's me again, For the life of me I can't get how to boost certain fields so they show at the top.

CF version: 9,0,1,274733
Update Level chf9010002.jar

I'm using the following code:
<cfquery name="getBooks">
select bookID, title, url, datePublished, author, description
from books
</cfquery>

When I run this I get 0 results.

If I removed the title from the criteria:
<cfsearch collection="books" criteria="#searchCriteria#" name="results" status="r" suggestions="always"
ContextBytes="1000" ContextPassages="4">

I get all the hits but entries without the search query in the title have higher ranking then the ones with the criteria.

Is this the correct syntax? do I have to make changes to the solrconfig.xml file?

Based on all I have read it should work, but it doesn't (for me).

Please help.

Thanks
Mike

Comment 78 by Raymond Camden posted on 11/27/2011 at 12:23 AM

Odd. I tested it on an index of cfdocs, and it seems to work ok.

Comment 79 by Mike posted on 11/27/2011 at 5:23 AM

and that's what's killing me. It should work but it doesn't. I have tried on 3 different computers with standard CF9 installation.

Is your installation standard?
Do I have to update the installation of solr?
Can you please send me your solr confing xml file.

I have read that I can play with that.

Thanks Ray for the quick response on a Saturday.

Mike

Comment 80 by Raymond Camden posted on 11/27/2011 at 5:27 AM

901+latest CHR.
My solr config wasn't modified. I just made an index of the cfdocs that ship with CF. If you make one it should be the same.

Comment 81 by Mike posted on 11/27/2011 at 8:22 AM

No luck, I have tried indexing documents instead of a query and same result.

If it's not too much to ask can you please (when you have some time) post the exact statements to create the collection, index and search with boost.

Definitely I am missing something.

Thanks again
Mike

Comment 82 by Raymond Camden posted on 11/27/2011 at 9:33 AM

Given a file index of cfdocs, this is what I saw:

I searched for rss and number one was: Adobe ColdFusion * cffeed

I then did

rss OR title:RSS

and then an item with RSS in the title went to the top. When I boosted the score by 10, it stayed at number one, but oddly the score went down by 10. So... um... not sure.

Comment 83 by Mike posted on 11/27/2011 at 8:17 PM

Don't know what to say. It just doesn't work. By adding criteria="#searchCriteria# or title:#searchCriteria#" the order actually changes but not for good. Actually the first 3 returns have no mention of the searchCriteria in the title, while the last item has. I was hoping that by looking at your exact code I can see something that I'm missing...Maybe the boost has to be done when indexing the collection

Thanks
and have a great week-end

Comment 84 by Raymond Camden posted on 11/28/2011 at 12:42 AM

Well, it makes sense that the first 3 may not have it in the title. You said, "X or title:X". Did you try "X OR title:X^10"

Comment 85 by Mike posted on 11/28/2011 at 2:08 AM

Yes, I have tried lower and upper case, with or without boosting the title, out of desperation change even the order.

And while the order changed there were more entries at the top without the search word in the title.

Thanks
Mike

Comment 86 by Raymond Camden posted on 11/28/2011 at 2:14 AM

Any chance you can share a zip of your data?

Support this Content!

Archived Comments

Webmentions