Search by Algolia
What is B2B ecommerce? Everything you need to know
e-commerce

What is B2B ecommerce? Everything you need to know

When you think of “customer experience,” what comes to mind? People, right? Specifically, consumers. Retail customers. That’s easy; the ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is ecommerce merchandising? Key components and best practices
e-commerce

What is ecommerce merchandising? Key components and best practices

A potential customer is about to land on the home page of your ecommerce platform, curious to see what cool ...

Catherine Dee

Search and Discovery writer

AI-powered search: From keywords to conversations
ai

AI-powered search: From keywords to conversations

By now, everyone’s had the opportunity to experiment with AI tools like ChatGPT or Midjourney and ponder their inner ...

Chris Stevenson

Director, Product Marketing

Vector vs Keyword Search: Why You Should Care
ai

Vector vs Keyword Search: Why You Should Care

Search has been around for a while, to the point that it is now considered a standard requirement in many ...

Nicolas Fiorini

Senior Machine Learning Engineer

What is AI-powered site search?
ai

What is AI-powered site search?

With the advent of artificial intelligence (AI) technologies enabling services such as Alexa, Google search, and self-driving cars, the ...

John Stewart

VP Corporate Marketing

What is a B2B marketplace?
e-commerce

What is a B2B marketplace?

It’s no secret that B2B (business-to-business) transactions have largely migrated online. According to Gartner, by 2025, 80 ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

3 strategies for B2B ecommerce growth: key takeaways from B2B Online - Chicago
e-commerce

3 strategies for B2B ecommerce growth: key takeaways from B2B Online - Chicago

Twice a year, B2B Online brings together industry leaders to discuss the trends affecting the B2B ecommerce industry. At the ...

Elena Moravec

Director of Product Marketing & Strategy

Deconstructing smart digital merchandising
e-commerce

Deconstructing smart digital merchandising

This is Part 2 of a series that dives into the transformational journey made by digital merchandising to drive positive ...

Benoit Reulier
Reshma Iyer

Benoit Reulier &

Reshma Iyer

The death of traditional shopping: How AI-powered conversational commerce changes everything
ai

The death of traditional shopping: How AI-powered conversational commerce changes everything

Get ready for the ride: online shopping is about to be completely upended by AI. Over the past few years ...

Aayush Iyer

Director, User Experience & UI Platform

What is B2C ecommerce? Models, examples, and definitions
e-commerce

What is B2C ecommerce? Models, examples, and definitions

Remember life before online shopping? When you had to actually leave the house for a brick-and-mortar store to ...

Catherine Dee

Search and Discovery writer

What are marketplace platforms and software? Why are they important?
e-commerce

What are marketplace platforms and software? Why are they important?

If you imagine pushing a virtual shopping cart down the aisles of an online store, or browsing items in an ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is an online marketplace?
e-commerce

What is an online marketplace?

Remember the world before the convenience of online commerce? Before the pandemic, before the proliferation of ecommerce sites, when the ...

Catherine Dee

Search and Discovery writer

10 ways AI is transforming ecommerce
e-commerce

10 ways AI is transforming ecommerce

Artificial intelligence (AI) is no longer just the stuff of scary futuristic movies; it’s recently burst into the headlines ...

Catherine Dee

Search and Discovery writer

AI as a Service (AIaaS) in the era of "buy not build"
ai

AI as a Service (AIaaS) in the era of "buy not build"

Imagine you are the CTO of a company that has just undergone a massive decade long digital transformation. You’ve ...

Sean Mullaney

CTO @Algolia

By the numbers: the ROI of keyword and AI site search for digital commerce
product

By the numbers: the ROI of keyword and AI site search for digital commerce

Did you know that the tiny search bar at the top of many ecommerce sites can offer an outsized return ...

Jon Silvers

Director, Digital Marketing

Using pre-trained AI algorithms to solve the cold start problem
ai

Using pre-trained AI algorithms to solve the cold start problem

Artificial intelligence (AI) has quickly moved from hot topic to everyday life. Now, ecommerce businesses are beginning to clearly see ...

Etienne Martin

VP of Product

Introducing Algolia NeuralSearch
product

Introducing Algolia NeuralSearch

We couldn’t be more excited to announce the availability of our breakthrough product, Algolia NeuralSearch. The world has stepped ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

AI is eating ecommerce
ai

AI is eating ecommerce

The ecommerce industry has experienced steady and reliable growth over the last 20 years (albeit interrupted briefly by a global ...

Sean Mullaney

CTO @Algolia

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

When we started Algolia, we wanted to build an offline search SDK for mobile applications. Unlike other search engines on the market, we couldn’t rely on pre-existing search engine software such as Lucene because they weren’t available on mobile.

We had to build everything from the ground up within the limitations imposed by smartphones such as low memory and low processing power. Ultimately, we spent a considerable amount of time reinventing the search engine from scratch.

When we eventually decided to pivot to a SaaS solution, we had built a completely new ranking algorithm. And in a typical “constraints foster creativity” situation, we had made two huge improvements: Speed and Relevance.

Algolia is known for its blazing-fast speed. It’s usually the first thing people notice when they start using it and also what makes it so exhilarating to use. But what I consider to be the most impressive and important achievement of Algolia is not actually speed but Relevance.

Algolia reinvented how search engines define relevance.

Before diving into the specifics behind Algolia’s ranking formula, let’s briefly go over how other search engines work.

A little history of search engines

In the late 90s, the Text Retrieval Conference organized a series of annual workshops financed in part by the US Department of Defense. The goal of these workshops was to determine the best ranking algorithm for searching a dataset consisting of long unstructured documents. The conference resulted in several enhancements to search engine ranking algorithms, the most defining being improvements to TF-IDF, what was then the main component of the most powerful ranking formulas.

TF-IDF is a statistical weighting scheme based on two elements: TF or Term Frequency (the number of occurrences of a word in a document, which is calculated on a per document basis) and IDF or Inverse Document Frequency (the importance of a query word within the whole corpus of documents).

Basically, if a query word is repeated multiple times within a document, that document will rank higher in the results than a document that only contains the query word once. IDF comes into play to reduce the importance of words that occur a lot in the corpus such as “the”, “of”, “a”, etc. The more times a query word recurs within the whole corpus, the lower a document containing that query word will be ranked. The length of the document is also taken into account; a short document with query word matches will rank higher than a long document with the same match.

TF-IDF (along with similar algorithms like BM25) became a central tool to score/rank documents given a user query, and it was used to search any type of data. Unfortunately, it was not actually designed for that! TD-IDF was built as an answer to TREC’s workshops and was designed specifically to work within long unstructured documents such as PDFs or enterprise documents.

The issue with this kind of statistical approach is that it surprisingly only makes sense for a very small number of use-cases, like enterprise document search, where:

  • You don’t really know what type of data you’re searching into
  • The content is drastically different from one document to another
  • The documents don’t have any popularity metrics, and you need to rank them solely based on their “textual relevance”

What Algolia was built for

Algolia wasn’t built to search long unstructured documents. It was built to power the search of websites and mobile applications. And in 99% of cases, this data is very different.

  • It has a defined schema with meaningful attributes that you know beforehand (title, name, list of tags, description…)
  • You know the underlying content, what matters and what doesn’t (e.g. is the title more important than the description?)
  • There are often one or more popularity metrics associated with each object (number of sales/likes/views)
  • As the owner, you sometimes have a strategy that impacts the ranking (give a bonus to featured objects)

Let’s look at an example

Say that you’re on IMDB searching for an actor, and you type “Brad”. You don’t care about the Term Frequency (how many times the document contains the word Brad), or the Inverse Document Frequency (how many times the word Brad is present in the list of all actors).

Fortunately, there are other signals that matter:

  • Is the word Brad in the “name” attribute or in the description?
  • Is it the first word or the last word in this attribute?
  • Is it spelled exactly B-r-a-d? Does it contain a typo? A suffix?
  • How many followers/likes/views does Brad have?

As you can guess, these questions give us a much better chance at finding Brad Pitt than a statistical approach like TF-IDF.

This is the first pillar of Algolia’s revolutionary improvements—the rules taken into account in the Ranking algorithm.

The rules in Algolia’s ranking formula

Algolia doesn’t rely on any variation of TF-IDF. Instead, it uses six default rules to evaluate the textual relevance of an object for a specific query:

  1. 1. Typo: Will rank higher a word that doesn’t contain a typing mistake.
  2. 2. Words: Will rank higher an object matching more of the words typed by the user if the query contains more than one word.
  3. 3. Proximity: Will rank higher words close to each other (George Clooney is better than George word Clooney).
  4. 4. Attribute: Will rank higher a match in a more important attribute (Title, Description).
  5. 5. Words position: Will rank higher words that are at the beginning of an attribute.
  6. 6. Exact: Will rank higher words that match exactly without any suffix (Algolia natively handles prefix search).

Most search engines have somewhat similar implementations of the Words and Attribute rules (some also have an implementation of Proximity). The rest of these rules are unique to Algolia. You can of course disable or customize the rules to fit your specific use-case, and you can also use non-textual criteria such as geo-location.

These rules are a great improvement over what existed before, but the biggest and most important change Algolia brought to the table is something else — how these rules are used to rank the objects.

How is the ranking done?

The biggest issue with the relevance of most search engines is not TF-IDF (which can actually be disabled in most search engines using it). It’s the way the ranking criteria are used. Other search engines compute a big floating-point score that mixes everything. They have a very complex mathematical formula with a lot of coefficients/boosts that merges every rule into one score for each document. Then, they use this unique score to rank the search results.

The first problem with this approach is that it mixes apples and oranges. For example, if your ranking formula takes into account TF-IDF and the number of likes of the documents:

  • What happens if a document has a very high number of likes? It will override all the other criteria like textual relevance, and this document will always show up in the results, even when the textual relevance score is very bad.
  • Similarly, if an object is the most relevant to the query but doesn’t have any likes, it won’t show up in the results.
  • The second issue with this approach is its complexity. Nobody is able to understand what’s really happening. Even the most skilled search experts need to spend a lot of time crafting the best mathematical formula. Via a trial-and-error process, they’ll find a formula that works well for a list of the 100 most popular queries, but the formula cannot be optimized for the whole catalogue. And as soon as the catalogue (or any metrics used in the formula) changes, they need to go back to the drawing board.

This exception-driven approach is not maintainable.

Algolia’s ranking formula is designed to fix this. And it does so with a different ranking approach via the Tie-Breaking algorithm.

The tie-breaking algorithm

Instead of calculating one big score and then sorting the results once, a tie-breaking algorithm calculates N scores and applies N successive sorting. Here’s how it works:

  • All the matching records are sorted according to the first criterion.
  • If any records are tied, those records are then sorted according the second criterion.
  • If there are still records that are tied, those are then sorted according to the third criterion.
  • And so on, until each record in the search results has a distinct position.

The order of the criteria determines their importance. It’s like in a children’s game—if you sort by color and then by shape, you don’t get the same results as when you sort by shape then color.

Configuring the relevance of Algolia for a specific use-case is as simple as figuring out the set of rules that need to be applied and ordering them by importance.

Oh, and everything is customizable—from the order of the attributes to which attributes are used. So there’s virtually no ranking strategy that you can’t build with this ranking formula.

Let’s consider an example with the following dataset:

[
 {
  "name": "Jon Black",
  "featured": true,
  "number_of_likes": 4
 },
 {
  "name": "John Jackson",
  "featured": false,
  "number_of_likes": 17
 },
 {
  "name": "John Paul",
  "featured": false,
  "number_of_likes": 3
 },
 {
  "name": "Jon White",
  "featured": false,
  "number_of_likes": 9
 },
 {
  "name": "John Thompson",
  "featured": true,
  "number_of_likes": 8
 }
]

And this index configuration:

Index Our ranking formula on Algolia's dashboard

To simplify our example, let’s ignore some of the rules and focus only on Typo, Featured and Number of Likes. For the query “John,” we’ll get the following results:

  1. 1. John Thompson (0 typos; featured; 8 likes)
  2. 2. John Jackson (0 typos; not-featured; 17 likes)
  3. 3. John Paul (0 typos; not-featured; 3 likes)
  4. 4. Jon Black (1 typo; featured; 4 likes)
  5. 5. Jon White (1 typo; not-featured; 9 likes)

In this example, we:

  • Decided that Typo was the most important rule. If an object has a typo, it should be ranked lower, no matter what the other rules are.
  • Gave a bonus to records marked “featured”.
  • Decided to position our main popularity metric, Number of likes, at the bottom of the formula so it doesn’t override the textual relevance criteria.

A few things to note:

  • We didn’t need to use coefficients or boosts of any kind. Nor did we need to craft some complex mathematical formula. But we still managed to take into account business metrics and merge them with our textual relevance.
  • This formula works for every query. It is not designed to work for the top 100 most popular searches, it is designed to work for the whole catalogue.
  • What happens if 50% of the catalogue changes tomorrow? Nothing. We don’t need to change the formula as long as the structure of the data remains the same.
  • It is very simple to understand why one result is ranked higher than another. And if you can understand the results, fine-tuning the formula will be much easier.

Where do we go from here?

As of writing this, Algolia is live on thousands of websites and mobile applications, and this ranking strategy has consistently proven successful. We have spent a long time fine-tuning every detail and are pleased with the level of performance. We continue to make relevance a core focus, in particular to introduce new criteria to better enable the personalization of results per user.

We’re always happy to help you figure out how you could use this algorithm to achieve the ranking strategy that best fits your specific use-case. And we’re eager to know what you think, so feel free to leave us a comment.

About the author
Nicolas Baissas

14-day free trial

Create a full-featured search experience in no time.

Get started
14-day free trial

Recommended Articles

Powered byAlgolia Algolia Recommend

Comparing Algolia and Elasticsearch For Consumer-Grade Search Part 2: Relevance Isn’t Luck
engineering

Josh Dzielak

Algolia's top 10 tips to achieve highly relevant search results
product

Julien Lemoine

Co-founder & former CTO at Algolia

What is a search query and how is it processed by a search engine?
product

Catherine Dee

Search and Discovery writer