Searching MySQL Tables

You could use the LIKE operator to search a set of fields for a string, but MySQL offers some built in indexing features that will simplify your task.

Let’s say you have a MySQL table with the following structure:

id            integer
title         varchar
article      text

You know want to add a search box to your website which will search the ‘articles’ table.   The first thing that may come to your mind is a query like the following:

SELECT id, title FROM articles WHERE title LIKE ‘%searchterm%’ OR description LIKE ‘%searchterm%’;

Sure, this would work for probably 90% of all searches, but it has 2 disadvantages:

  • It won’t work for the other 10% of searches
  • It is slow as ‘title’ and ‘article’ are not indexed

“Why won’t it work for the other 10% of searches?”, you may ask.  The reason is, let’s assume that you have a record in your table with title ‘MySQL – How to search tables’.   Next, assume that a user performs a search on the following term ‘MySQL search’.  The LIKE ‘%searchterm%’ won’t pick up on this search because the words are separated.

Don’t worry, MySQL identified this problem many years ago and has since add the FULLTEXT index.  You can add a FULLTEXT index to a field like so:

ALTER TABLE articles ADD FULLTEXT (title);
ALTER TABLE articles ADD FULLTEXT (article);

After you’ve done this, you open up those fields to a whole new world of queries.  You’ll be most interested in the following:

SELECT id, title, MATCH (title, article) AGAINST (‘searchterm’) AS score
FROM story WHERE MATCH (title, article) AGAINST (‘searchterm’)

In a nutshell, MYSQL parses your search term and breaks it down into individual words, then matches those words against the FULLTEXT index.   So now, a search for ‘MySQL search’, as in the previous example, will return results.

By adding MATCH (title, article) AGAINST (‘searchterm’) AS score to the SELECT statement, you will be able to retrieve the relevance scoring that MySQL has assigned to each result.  Unfortunately, the relevance score is very seldom a useable number, i.e. it is rarely a number that is between 0 and 1, so you have no way to compare one result’s relevance against anothers, except for the fact that one is lower than the other.

So, now that you have these relevance scores that are seemingly meaningless, what can we do to make them more meaningfull.  I suppose we could first query for the maximum relevance of the search term, then query again to return the search results.  With a little division we can work out a human readable relevance. But that involves two MySQL queries for each search.

Another method takes advantage of your scripting language of choice. Let’s use PHP for our example:

<ol id="results">
$max = 0;
while ($row = getdata($res)) {
if ($max == 0) $max = $row['score'];
echo '<li>';
echo '<a href="/page/&id=' . $row['id'] . '">' . $row['title'] .  '</a> ';
echo round($row ['score'] / $max * 100, 0). '%';
echo '</li>';

Try it out, it’s simple, but effective, and also avoids making two calls to MySQL.