BV Commerce 6 includes a completely new search engine. Previous versions used simple SQL “LIKE” statements to match up parts of words. This made it difficult to determine relevancy and had trouble handling plural words. The new search engine functions more like Google. The BVSoftware.Search project has the main implementation of a generic search engine and the BVSoftware.Commerce project has management classes that are specific to commerce web sites.
Lexicon
The first part of the new search system is the Lexicon. This is a simple storage container that holds words and assigns them a numerical value. Every word that has ever been indexed in the search engine will exist exactly once in the lexicon. Part of the magic of the Lexicon is that although it can accept any word, BV only inserts stemmed words.
Stemmer
The stemmer is a tool used to convert plain English text into specialized roots of the word. For example, “Flower” and “Flowers” would stem to the same root. So would “Child” and “Children” which is more difficult to classify with simple rules. The BV Commerce stemmer uses the Porter Stemmer Algorithm. The first version was written by Martin Porter in the 1980’s and has become the de-facto standard for English language stemming. Keep in mind that if you are using BV Commerce in a non-English language you may need to replace the stemmer algorithm with one appropriate to your store’s language.
Search Objects and Search Object Words
The Lexicon stores information about words and SearchObjects are the items that contain words. SearchObjectWords are the core of the search system. SearchObjectWords relate SearchObjects to specific words with relevancy information. They link a “SearchObjectId” to a “WordId” from the Lexicon and have a “Score” which tells a searcher how important that word is for the SearchObject. A SearchObject is simply anything that can be indexed. It has an Id, a short summary of the item and links to the actual item. Item Type is also useful on SearchObjects allowing the search engine to return results for Products, Pages, or both.
Indexers
An indexer takes information about words and scores for an object and inserts it into the catalog of searchable items. A simple text indexer scores words based on the frequency with which they appear and a complex indexer accepts a list of pre-scored words for indexing. The complex indexer is used to index items in BV. The BVSoftware.Commerce.Catalog.SearchManager class pre-scores words for Products and other BV items then sends that data to the Complex Indexer to be included in future searches. The SearchManager scores words based on their importantance as BV Product information. For example, a word in the Title of a product will have a higher score than the same word in the description of the product. This helps return relevant search results even if other items had a higher count of the same word in their description.
Text Parser
The Text Parser is responsible for taking strings of text and converting them into a list of usable search terms. This includes splitting the text into individual words, removing non-alphanumeric characters, stemming words and removing stop words. Stop words are real words that have little to no effect on searches. Common examples include “I”,”a”,”and”,”but”,”of”,”from”,etc. By removing these words searches are more efficient and relevant. The Text Parser is used to clean up incoming queries and by the indexer so that the same rules apply to text from objects as text from search requests.
Searcher
The searcher class is the glue that ties everything together to deliver search results. It takes a Lexicon of words, a store of SearchObjects and SearchObjectWords and uses a Text Parser to convert from a text query into Word and Object Ids.
Summary
When a customer types in a search request on a BV Commerce 6 store the following steps take place:
- The search query is sent to a text parser where it is split into words and stop words are removed
- The query words are stemmed to their roots
- A searcher looks up those words in the Lexicon to see if they exist yet and to get their Ids
- The search looks through all of the search words to find the search objects that contain the highest score for those words and returns the a list of those objects.
- The BV search results page looks at the results and generates links to the items based on their Id and object type.