CouchDB and Lucene

Quick notes to self on fulltext search and CouchDB. Note that links to CouchDB are local to my machine(s),and won't work unless you are me, or have a copy of the same database running on your machine). CouchDB and Lucene adds fulltext indexing to CouchDB. After a few false starts I now have this working. The documentation is a little misleading, you don't need to clone the github repository, nor use Maven to build couchdb-lucene (at least, I didn't). Instead I grabbed couchdb-lucene-0.5.6, unpacked it, used that as is.

To configure CouchDB I ended up editing the configuration using Futon (there's a link "Add a new section" down the bottom of the Configuration page), then I restarted CouchDB. The things to add are:


[couchdb]
os_process_timeout=60000 ; increase the timeout from 5 seconds.
[external]
fti=/path/to/python /path/to/couchdb-lucene/tools/couchdb-external-hook.py
[httpd_db_handlers]
_fti = {couch_httpd_external, handle_external_req, <<"fti">>}


To start couchdb-lucene, just cd couchdb-lucene-0.5.6 and bin/run.

Then it's a case of adding a fulltext index. In Futon I start adding a regular design document, then edit the Javascript. For example, here is a simple index on document titles:


{
"_id": "_design/lucene",
"_rev": "2-96b333dfc77866a13c0de7f856d27b6c",
"language": "javascript",
"fulltext": {
"by_title": {
"index": "function(doc) {
var ret=new Document();
ret.add(doc.title);
return ret
}"
}
}
}


Once the indexing has been completed, you can search the CouchDB database using a URL like this: http://localhost:5984/col2010ref/_fti/_design/lucene/by_title?q=frog+new+species.

Lots more to do here, but with spatial queries and now fulltext search, it's time to start building something...