Jakarta Lucene is amazing - my blog is searchable!

After trying to find an entry that I wrote on my blog I decided that it was time to implement a search facility. Search engines aren't the simplest things in the world to build so I looked at an open source implementation that had caught my eye before - Lucene.

Lucene

If you've not seen it, Lucene provides an incredibly simple API for both indexing and searching content. Also, it's very fast. Setting up Lucene is a breeze - just drop the JAR file into your classpath.

As far as using Lucene is concerned, you first need to write a bit of code that indexes your content. For me, this just involves running through all my blog entries and asking Lucene to index the title and body of each entry. For my entire blog this indexing process took about a second! With the index created, you simply use another couple of classes within the framework to execute a search query and get a collection of hits back. You can even incrementally index new/modified content. In just a couple of hours with Lucene you can have a full featured search facility in your Java application. Nice.

For anybody curious about Lucene, I would recommend The Lucene search engine: Powerful, flexible, and free (JavaWorld) and Introduction to Text Indexing with Apache Jakarta Lucene (OnJava). If you're interested in the code that I used (only a first cut at the moment), you can browse the CVS tree. Lucene is an amazing product and the price is right. Definitely recommended!

Update : Comments are now also included in the search indexes. :-)

Tags :