What is B2B ecommerce? Everything you need to know
When you think of “customer experience,” what comes to mind? People, right? Specifically, consumers. Retail customers. That’s easy; the ...
Sr. SEO Web Digital Marketing Manager
When you think of “customer experience,” what comes to mind? People, right? Specifically, consumers. Retail customers. That’s easy; the ...
Sr. SEO Web Digital Marketing Manager
A potential customer is about to land on the home page of your ecommerce platform, curious to see what cool ...
Search and Discovery writer
By now, everyone’s had the opportunity to experiment with AI tools like ChatGPT or Midjourney and ponder their inner ...
Director, Product Marketing
Search has been around for a while, to the point that it is now considered a standard requirement in many ...
Senior Machine Learning Engineer
With the advent of artificial intelligence (AI) technologies enabling services such as Alexa, Google search, and self-driving cars, the ...
VP Corporate Marketing
It’s no secret that B2B (business-to-business) transactions have largely migrated online. According to Gartner, by 2025, 80 ...
Sr. SEO Web Digital Marketing Manager
Twice a year, B2B Online brings together industry leaders to discuss the trends affecting the B2B ecommerce industry. At the ...
Director of Product Marketing & Strategy
This is Part 2 of a series that dives into the transformational journey made by digital merchandising to drive positive ...
Benoit Reulier &
Reshma Iyer
Get ready for the ride: online shopping is about to be completely upended by AI. Over the past few years ...
Director, User Experience & UI Platform
Remember life before online shopping? When you had to actually leave the house for a brick-and-mortar store to ...
Search and Discovery writer
If you imagine pushing a virtual shopping cart down the aisles of an online store, or browsing items in an ...
Sr. SEO Web Digital Marketing Manager
Remember the world before the convenience of online commerce? Before the pandemic, before the proliferation of ecommerce sites, when the ...
Search and Discovery writer
Artificial intelligence (AI) is no longer just the stuff of scary futuristic movies; it’s recently burst into the headlines ...
Search and Discovery writer
Imagine you are the CTO of a company that has just undergone a massive decade long digital transformation. You’ve ...
CTO @Algolia
Did you know that the tiny search bar at the top of many ecommerce sites can offer an outsized return ...
Director, Digital Marketing
Artificial intelligence (AI) has quickly moved from hot topic to everyday life. Now, ecommerce businesses are beginning to clearly see ...
VP of Product
We couldn’t be more excited to announce the availability of our breakthrough product, Algolia NeuralSearch. The world has stepped ...
Chief Executive Officer and Board Member at Algolia
The ecommerce industry has experienced steady and reliable growth over the last 20 years (albeit interrupted briefly by a global ...
CTO @Algolia
Mar 10th 2016 engineering
When we started Algolia, we wanted to build an offline search SDK for mobile applications. Unlike other search engines on the market, we couldn’t rely on pre-existing search engine software such as Lucene because they weren’t available on mobile.
We had to build everything from the ground up within the limitations imposed by smartphones such as low memory and low processing power. Ultimately, we spent a considerable amount of time reinventing the search engine from scratch.
When we eventually decided to pivot to a SaaS solution, we had built a completely new ranking algorithm. And in a typical “constraints foster creativity” situation, we had made two huge improvements: Speed and Relevance.
Algolia is known for its blazing-fast speed. It’s usually the first thing people notice when they start using it and also what makes it so exhilarating to use. But what I consider to be the most impressive and important achievement of Algolia is not actually speed but Relevance.
Algolia reinvented how search engines define relevance.
Before diving into the specifics behind Algolia’s ranking formula, let’s briefly go over how other search engines work.
In the late 90s, the Text Retrieval Conference organized a series of annual workshops financed in part by the US Department of Defense. The goal of these workshops was to determine the best ranking algorithm for searching a dataset consisting of long unstructured documents. The conference resulted in several enhancements to search engine ranking algorithms, the most defining being improvements to TF-IDF, what was then the main component of the most powerful ranking formulas.
TF-IDF is a statistical weighting scheme based on two elements: TF or Term Frequency (the number of occurrences of a word in a document, which is calculated on a per document basis) and IDF or Inverse Document Frequency (the importance of a query word within the whole corpus of documents).
Basically, if a query word is repeated multiple times within a document, that document will rank higher in the results than a document that only contains the query word once. IDF comes into play to reduce the importance of words that occur a lot in the corpus such as “the”, “of”, “a”, etc. The more times a query word recurs within the whole corpus, the lower a document containing that query word will be ranked. The length of the document is also taken into account; a short document with query word matches will rank higher than a long document with the same match.
TF-IDF (along with similar algorithms like BM25) became a central tool to score/rank documents given a user query, and it was used to search any type of data. Unfortunately, it was not actually designed for that! TD-IDF was built as an answer to TREC’s workshops and was designed specifically to work within long unstructured documents such as PDFs or enterprise documents.
The issue with this kind of statistical approach is that it surprisingly only makes sense for a very small number of use-cases, like enterprise document search, where:
Algolia wasn’t built to search long unstructured documents. It was built to power the search of websites and mobile applications. And in 99% of cases, this data is very different.
Say that you’re on IMDB searching for an actor, and you type “Brad”. You don’t care about the Term Frequency (how many times the document contains the word Brad), or the Inverse Document Frequency (how many times the word Brad is present in the list of all actors).
Fortunately, there are other signals that matter:
As you can guess, these questions give us a much better chance at finding Brad Pitt than a statistical approach like TF-IDF.
This is the first pillar of Algolia’s revolutionary improvements—the rules taken into account in the Ranking algorithm.
Algolia doesn’t rely on any variation of TF-IDF. Instead, it uses six default rules to evaluate the textual relevance of an object for a specific query:
Most search engines have somewhat similar implementations of the Words and Attribute rules (some also have an implementation of Proximity). The rest of these rules are unique to Algolia. You can of course disable or customize the rules to fit your specific use-case, and you can also use non-textual criteria such as geo-location.
These rules are a great improvement over what existed before, but the biggest and most important change Algolia brought to the table is something else — how these rules are used to rank the objects.
The biggest issue with the relevance of most search engines is not TF-IDF (which can actually be disabled in most search engines using it). It’s the way the ranking criteria are used. Other search engines compute a big floating-point score that mixes everything. They have a very complex mathematical formula with a lot of coefficients/boosts that merges every rule into one score for each document. Then, they use this unique score to rank the search results.
The first problem with this approach is that it mixes apples and oranges. For example, if your ranking formula takes into account TF-IDF and the number of likes of the documents:
This exception-driven approach is not maintainable.
Algolia’s ranking formula is designed to fix this. And it does so with a different ranking approach via the Tie-Breaking algorithm.
Instead of calculating one big score and then sorting the results once, a tie-breaking algorithm calculates N scores and applies N successive sorting. Here’s how it works:
The order of the criteria determines their importance. It’s like in a children’s game—if you sort by color and then by shape, you don’t get the same results as when you sort by shape then color.
Configuring the relevance of Algolia for a specific use-case is as simple as figuring out the set of rules that need to be applied and ordering them by importance.
Oh, and everything is customizable—from the order of the attributes to which attributes are used. So there’s virtually no ranking strategy that you can’t build with this ranking formula.
Let’s consider an example with the following dataset:
[ { "name": "Jon Black", "featured": true, "number_of_likes": 4 }, { "name": "John Jackson", "featured": false, "number_of_likes": 17 }, { "name": "John Paul", "featured": false, "number_of_likes": 3 }, { "name": "Jon White", "featured": false, "number_of_likes": 9 }, { "name": "John Thompson", "featured": true, "number_of_likes": 8 } ]
And this index configuration:
To simplify our example, let’s ignore some of the rules and focus only on Typo, Featured and Number of Likes. For the query “John,” we’ll get the following results:
In this example, we:
A few things to note:
As of writing this, Algolia is live on thousands of websites and mobile applications, and this ranking strategy has consistently proven successful. We have spent a long time fine-tuning every detail and are pleased with the level of performance. We continue to make relevance a core focus, in particular to introduce new criteria to better enable the personalization of results per user.
We’re always happy to help you figure out how you could use this algorithm to achieve the ranking strategy that best fits your specific use-case. And we’re eager to know what you think, so feel free to leave us a comment.
Create a full-featured search experience in no time.
Powered by Algolia Recommend