How we tackled the search engine relevance problem

Back to all blogs

When we started Algolia, we wanted to build an offline search SDK for mobile applications. Unlike other search engines on the market, we couldn’t rely on pre-existing search engine software such as Lucene because they weren’t available on mobile.

We had to build everything from the ground up within the limitations imposed by smartphones such as low memory and low processing power. Ultimately, we spent a considerable amount of time reinventing the search engine from scratch.

When we eventually decided to pivot to a SaaS solution, we had built a completely new ranking algorithm. And in a typical “constraints foster creativity” situation, we had made two huge improvements: Speed and Relevance.

Algolia is known for its blazing-fast speed. It’s usually the first thing people notice when they start using it and also what makes it so exhilarating to use. But what I consider to be the most impressive and important achievement of Algolia is not actually speed but Relevance.

Algolia reinvented how search engines define relevance.

Before diving into the specifics behind Algolia’s ranking formula, let’s briefly go over how other search engines work.

A little history of search engines

In the late 90s, the Text Retrieval Conference organized a series of annual workshops financed in part by the US Department of Defense. The goal of these workshops was to determine the best ranking algorithm for searching a dataset consisting of long unstructured documents. The conference resulted in several enhancements to search engine ranking algorithms, the most defining being improvements to TF-IDF, what was then the main component of the most powerful ranking formulas.

TF-IDF is a statistical weighting scheme based on two elements: TF or Term Frequency (the number of occurrences of a word in a document, which is calculated on a per document basis) and IDF or Inverse Document Frequency (the importance of a query word within the whole corpus of documents).

Basically, if a query word is repeated multiple times within a document, that document will rank higher in the results than a document that only contains the query word once. IDF comes into play to reduce the importance of words that occur a lot in the corpus such as “the”, “of”, “a”, etc. The more times a query word recurs within the whole corpus, the lower a document containing that query word will be ranked. The length of the document is also taken into account; a short document with query word matches will rank higher than a long document with the same match.

TF-IDF (along with similar algorithms like BM25) became a central tool to score/rank documents given a user query, and it was used to search any type of data. Unfortunately, it was not actually designed for that! TD-IDF was built as an answer to TREC’s workshops and was designed specifically to work within long unstructured documents such as PDFs or enterprise documents.

The issue with this kind of statistical approach is that it surprisingly only makes sense for a very small number of use-cases, like enterprise document search, where:

You don’t really know what type of data you’re searching into
The content is drastically different from one document to another
The documents don’t have any popularity metrics, and you need to rank them solely based on their “textual relevance”

What Algolia was built for

Algolia wasn’t built to search long unstructured documents. It was built to power the search of websites and mobile applications. And in 99% of cases, this data is very different.

It has a defined schema with meaningful attributes that you know beforehand (title, name, list of tags, description…)
You know the underlying content, what matters and what doesn’t (e.g. is the title more important than the description?)
There are often one or more popularity metrics associated with each object (number of sales/likes/views)
As the owner, you sometimes have a strategy that impacts the ranking (give a bonus to featured objects)

Let’s look at an example

Say that you’re on IMDB searching for an actor, and you type “Brad”. You don’t care about the Term Frequency (how many times the document contains the word Brad), or the Inverse Document Frequency (how many times the word Brad is present in the list of all actors).

Fortunately, there are other signals that matter:

Is the word Brad in the “name” attribute or in the description?
Is it the first word or the last word in this attribute?
Is it spelled exactly B-r-a-d? Does it contain a typo? A suffix?
How many followers/likes/views does Brad have?

As you can guess, these questions give us a much better chance at finding Brad Pitt than a statistical approach like TF-IDF.

This is the first pillar of Algolia’s revolutionary improvements—the rules taken into account in the Ranking algorithm.

The rules in Algolia’s ranking formula

Algolia doesn’t rely on any variation of TF-IDF. Instead, it uses six default rules to evaluate the textual relevance of an object for a specific query:

1. Typo: Will rank higher a word that doesn’t contain a typing mistake.
2. Words: Will rank higher an object matching more of the words typed by the user if the query contains more than one word.
3. Proximity: Will rank higher words close to each other (George Clooney is better than George word Clooney).
4. Attribute: Will rank higher a match in a more important attribute (Title, Description).
5. Words position: Will rank higher words that are at the beginning of an attribute.
6. Exact: Will rank higher words that match exactly without any suffix (Algolia natively handles prefix search).

Most search engines have somewhat similar implementations of the Words and Attribute rules (some also have an implementation of Proximity). The rest of these rules are unique to Algolia. You can of course disable or customize the rules to fit your specific use-case, and you can also use non-textual criteria such as geo-location.

These rules are a great improvement over what existed before, but the biggest and most important change Algolia brought to the table is something else — how these rules are used to rank the objects.

How is the ranking done?

The biggest issue with the relevance of most search engines is not TF-IDF (which can actually be disabled in most search engines using it). It’s the way the ranking criteria are used. Other search engines compute a big floating-point score that mixes everything. They have a very complex mathematical formula with a lot of coefficients/boosts that merges every rule into one score for each document. Then, they use this unique score to rank the search results.

The first problem with this approach is that it mixes apples and oranges. For example, if your ranking formula takes into account TF-IDF and the number of likes of the documents:

What happens if a document has a very high number of likes? It will override all the other criteria like textual relevance, and this document will always show up in the results, even when the textual relevance score is very bad.
Similarly, if an object is the most relevant to the query but doesn’t have any likes, it won’t show up in the results.
The second issue with this approach is its complexity. Nobody is able to understand what’s really happening. Even the most skilled search experts need to spend a lot of time crafting the best mathematical formula. Via a trial-and-error process, they’ll find a formula that works well for a list of the 100 most popular queries, but the formula cannot be optimized for the whole catalogue. And as soon as the catalogue (or any metrics used in the formula) changes, they need to go back to the drawing board.

This exception-driven approach is not maintainable.

Algolia’s ranking formula is designed to fix this. And it does so with a different ranking approach via the Tie-Breaking algorithm.

The tie-breaking algorithm

Instead of calculating one big score and then sorting the results once, a tie-breaking algorithm calculates N scores and applies N successive sorting. Here’s how it works:

All the matching records are sorted according to the first criterion.
If any records are tied, those records are then sorted according the second criterion.
If there are still records that are tied, those are then sorted according to the third criterion.
And so on, until each record in the search results has a distinct position.

The order of the criteria determines their importance. It’s like in a children’s game—if you sort by color and then by shape, you don’t get the same results as when you sort by shape then color.

Configuring the relevance of Algolia for a specific use-case is as simple as figuring out the set of rules that need to be applied and ordering them by importance.

Oh, and everything is customizable—from the order of the attributes to which attributes are used. So there’s virtually no ranking strategy that you can’t build with this ranking formula.

Let’s consider an example with the following dataset:

[
 {
  "name": "Jon Black",
  "featured": true,
  "number_of_likes": 4
 },
 {
  "name": "John Jackson",
  "featured": false,
  "number_of_likes": 17
 },
 {
  "name": "John Paul",
  "featured": false,
  "number_of_likes": 3
 },
 {
  "name": "Jon White",
  "featured": false,
  "number_of_likes": 9
 },
 {
  "name": "John Thompson",
  "featured": true,
  "number_of_likes": 8
 }
]

And this index configuration:

To simplify our example, let’s ignore some of the rules and focus only on Typo, Featured and Number of Likes. For the query “John,” we’ll get the following results:

1. John Thompson (0 typos; featured; 8 likes)
2. John Jackson (0 typos; not-featured; 17 likes)
3. John Paul (0 typos; not-featured; 3 likes)
4. Jon Black (1 typo; featured; 4 likes)
5. Jon White (1 typo; not-featured; 9 likes)

In this example, we:

Decided that Typo was the most important rule. If an object has a typo, it should be ranked lower, no matter what the other rules are.
Gave a bonus to records marked “featured”.
Decided to position our main popularity metric, Number of likes, at the bottom of the formula so it doesn’t override the textual relevance criteria.

A few things to note:

We didn’t need to use coefficients or boosts of any kind. Nor did we need to craft some complex mathematical formula. But we still managed to take into account business metrics and merge them with our textual relevance.
This formula works for every query. It is not designed to work for the top 100 most popular searches, it is designed to work for the whole catalogue.
What happens if 50% of the catalogue changes tomorrow? Nothing. We don’t need to change the formula as long as the structure of the data remains the same.
It is very simple to understand why one result is ranked higher than another. And if you can understand the results, fine-tuning the formula will be much easier.

Where do we go from here?

As of writing this, Algolia is live on thousands of websites and mobile applications, and this ranking strategy has consistently proven successful. We have spent a long time fine-tuning every detail and are pleased with the level of performance. We continue to make relevance a core focus, in particular to introduce new criteria to better enable the personalization of results per user.

We’re always happy to help you figure out how you could use this algorithm to achieve the ranking strategy that best fits your specific use-case. And we’re eager to know what you think, so feel free to leave us a comment.

How Algolia tackled the relevance problem of search engines