Part one of a six part series on Drupal’s search functionality.
This installment will discuss node module’s responsibilities in Drupal searching, show how to create contributed modules which take advantage of node module’s indexing methods, and discuss some pending patches for improving this flexibility.
Search Module and general search overview
Searching In General
In general, there are three fundamental components to a any search platform: (1) creating a search index, (2) searching the search index, and (3) a user interface. Search module provides a framework for all three of these fundamental search components (plus one more for administrative services). Search module mostly provides the framework for searching, basic user interface, and helper functions (and data structures) for calculating keyword relevancy and searching. Most of the framework is customizable via hook_search.
The Drupal Search Module Framework
The search module provides a framework for indexing and searching a site as well as the algorithm for calculating keyword relevancy for search results. Drupal modules (like user and node) use this engine to control the indexing of nodes, create rules for generating the text which search uses, and functions for actually indexing and searching their text.
A search implementor module is any module that implements hook_search. The search module automatically adds each search implementor’s “name” as a tab to the search page. And, during cron, search module invokes the hook to let modules update their indexes. Search module adds tabs to the search page and creates the basic search forms for each search module. After the user enters search criterion, search module invokes the
hook_search of the search implementor module to return the search results.
The relationship between search module and search implementors, could be a little cleaner. For example, many of the node search specifics are in search module and these might better be understood if they were in node module. While that has not been proposed, it should be discussed.
User Module – a simple Search Implementor
The user module provides a simple search implementation for finding users using search module’ssearch module’s ranking methods.
Node module – a complex Search Implementor
Node module interacts with search in a number of ways. It limits the number of nodes indexed during cron, which helps keep server loads down. It keeps track of which nodes need indexing. Node module generates the node text, but uses a search module helper function for calculating keyword relevancy. It is this search module helper function that does most of the work in node indexing and stores the results in the search index. Node module also implements a framework of its own (through hook_nodeapi) so that other modules can add html text to the rendered node body which is indexed by a search module helper function (search_index). Taxonomy and comment modules both make use of this to add data to the search index of the node while not effecting the original node content.
To give credit where credit is due, Steven Wittens is the original author of search, and still, pretty much the only person who really really understands how search works.
The position (or ranking) of a node in the search results is determined by three separate factors: keyword relevancy, age (recently posted), and number of comments. These are configurable on the search settings page (admin/settings/search), allowing an administrator the ability to alter the importance of these three components. By changing the weight to 0, and a component can be completely removed from influencing the search results. This is useful if comments are not relevant to search indexes on a particular site, for example.
The default settings are equal weight to all three. On Drupal.org, these three values are set to 10, 5, and 1, respectively, giving the highest weight to keyword relevancy, moderate weight to age, and little (but some) weight to the number of comments. These are probably more reasonable defaults for most sites. Sites with small or no user communities, should consider completely removing comment relevancy by setting the factor’s weight to 0.
There is a patch in the Drupal queue for making the factor’s extensible by contrib modules, so that, for example, a voting module could influence the nodes search relevancy.
Node Word Rank
The module generates HTML for the node using the the node view. Additional text, such as comments and taxonomies, are added to the HTML using hook_nodeapi ‘update index’. And finally, this HTML is sent to the search helper function
The search module then assigns a numeric value for each word in this generated HTML, and stores that in a table. Remember that there are currently three possible node ranks. This value is the node word ranks score that is used to show the highest ranked nodes for a word.
As part of the node word rank methods, search module also inserts the word rank scores found in the node, into any nodes that the node links to. This improves the search relevancy of the linked to page.
However, it has a couple flaws. First, the link checker only recognizes links to node/$nid. Second, these links greatly inflate the size of the
There’s a patch in the Drupal issue queue for solving this. It does so by using another table to store the link relationships, forces re-indexing of the linked to node, and adds the linked from words during the re-index. This improves performance by shrinking the search_index drastically and makes it possible for the SQL enhancements discussed in the next articles.
Views_fastsearch implements a views filter that takes advantage of this patch, so that lists of nodes that link to or from a node can be made.
Because of the search node link behavior, it is possible to improve the relevancy of a node by linking to it frequently. One technique to automating this is to create an input filter, that auto-creates these links. For example, if the site is a store with part numbers, and the parts refer to other part numbers, a filter could be created to convert references to part numbers into node based links to those parts, thus improving the relevancy of the linked to parts. Another examples might be a music networking site with well known artist names and albums, where people might refer to the artist, but not necessarily link to their node by node number.
See hook_filter for the particulars of how to write a filter.