So, what is kosearch? It's a Search module for Kohana PHP. More specifically, it's an implementation of Zend (Lucene) Search, a file-based search/index solution. kosearch provides a simple way to index and search Models. It's perfect for a web site that might contain news, products etc. kosearch also exposes the underlying Zend libraries so that other things can be indexed - PDFs, web pages, Word docs etc.
The kosearch module has been written for, and tested against Kohana 2.3.4
Q. Why Use Zend Lucene Search? I can use MySQL Text Search.
A. True, but for text search to work the table structure must be ISAM. ISAM tables don't support Transactions. Plus Zend search is more powerful than text search, and doesn't hit the database. And of course, you might want to index non-database assets such as PDFs, Word docs, images etc.
To use the search module follow these steps:
MODPATH.'search')Your folder structure should be as follows:
application / searchindex
application / vendor / StandardAnalyzer
application / vendor / Zend / Exception.php
application / vendor / Zend / Loader
application / vendor / Zend / Loader.php
application / vendor / Zend / Search
Add some music to search against
By selecting the above link, the following will be added to the search index:
| Artist | Song Title | Media (Model class) |
|---|---|---|
| Ian Brown | My Star | MP3 |
| Rolling Stones | Brown Sugar | MP3 |
| Stone Roses | Sugar Spun Sister | CD |
| David Bowie | Starman | CD |
| Bob Dylan | Like a Rolling Stone | MP3 |
You should see some files in the searchindex folder. These are the index files
Try the following searches: stone, star, star*, sugar, title:stone, artist:stone, type:cd and title:sugar
| Artist | Song Title | Relevance |
|---|---|---|
| Bob Dylan | Like a Rolling Stone |
To add a Model to the search index, it must implement the Searchable interface. This interface is defined as follows:
/**
* @return array of Search_Field objects
*/
function get_indexable_fields();
/**
* @return mixed identifier for this item
* For ORM Models this would be the PK
*/
function get_identifier();
/**
* @return String type of item
* For ORM Models this would likely be the object name
*/
function get_type();
/**
* @return mixed unique id of this item
*/
function get_unique_identifier();
}
The search module includes an abstract ORM implementation of this interface, that implements all methods except get_indexable_fields
The identifier would likely be the Primary Key for an ORM Model. This is important, as when a record is retrieved from a search, it only contains indexed data, not all attributes. So, to display all Model attributes, it might be necessary to fetch the record by it's PK.
The unique identifier must be unique to the Lucene index. If you are indexing more than one Model, PK's will not be unique. The ORM implementation uses both the PK and Model name to create a unique ID. The unique ID is required when updating/deleting an entry from the index.
The type allows for search by Model type. See the example code in this distribution to see how this works. In the example above, the two media types - 'CD', and 'MP3' - are the class types, not attributes of the class.
The get_indexable_fields method is the only complicated part to this solution. This method defines what fields to index, and what type of index to create. Essentially, there are 5 field types - different types are stored, indexed and tokenised. These field types are defined by Lucene and explained well in the Zend documentation.
The Searchable interface defines constants mapping to these field types, along with another set of constants relating to HTML Decoding.
For example, a blog Model might be defined as follows:
class Blog_Model extends Searchable_ORM {
/**
* Searchable interface implementation
*/
public function get_indexable_fields() {
$fields = array();
$fields[] = new Search_Field('url', Searchable::UNINDEXED);
$fields[] = new Search_Field('title', Searchable::TEXT, Searchable::DECODE_HTML);
$fields[] = new Search_Field('content', Searchable::UNSTORED, Searchable::DECODE_HTML);
$fields[] = new Search_Field('date', Searchable::UNINDEXED);
return $fields;
}
}
Here, we are telling the index to:
We also tell the module to decode the HTML for the title and content prior to indexing. This allows HTML content to be indexed. If you have a CMS solution which allows HTML content to be stored this will be useful. The default is to not decode.
Once you have defined your Models to implement the Searchable interface, adding content is simple, using the Search class.
To add a Model to the index:
$search = new Search;
$search->add($model);
To build a new index, build an array of indexable models. The index will be re-created fresh:
$search = new Search;
$search->build_search_index($models);
To update a Model, it is removed, then re-added to the index:
$search = new Search;
$search->update($model);
To delete a Model:
$search = new Search;
$search->remove($model);
This bit is really tricky ;-)
$search = new Search;
$search->find($query);
The example above gives a few ideas about how to query the index. I suggest reading the documentation about all the possible ways to search using the query language - terms, fields, wildcards, ranges, booleans etc.
Zend Search is capable of indexing web pages, PDFs, Word docs etc. The docs explain in detail how to do this. Below is an example that indexes the Kohana home page. Take a look at the source to seee hoe it works:
Add Kohana home page to search against
Now try the following search: kohana
kosearch is developed and maintained by badlydrawntoy