Other Types
AI

A deep dive into learning to rank for AI-powered search

Published:
Back to all blogs

Listen to this blog as a podcast:

We talk a lot about relevance here at Algolia. Once you’ve heard that word so many times, it’s tough for it not to sound like one of those silly, meaningless corporate descriptions like “synergy” or “holistic”. So let’s get serious — what does relevance look like as a real, meaningful quantity? And if it is a real metric, how do you calculate it?

Relevance is typically defined as how well our results match the search query. And since we’ve already done all this work analyzing the contents of the records that ended up in the search results, that part is already as objective as possible. The definitely-not-objective part is the query itself. Why do we say that? Well, if the user is looking for something specific, but the search engine returns results that technically match the query but weren’t what the user was looking for, would they call the results relevant? Probably not. Or what about when the user searches for a generic term and gets thousands of equally-connected keyword matches? Surely we can’t call those relevant either.

In reality, the blurry part of this picture is the user’s intent — in other words, quantifying relevance means figuring out if the user actually liked the results. That’s a more meaningful metric than what vector databases use to match records (how similar the vectors representing their words are) in 99% of use cases, since users will only buy the products from the search results that match their intent.

Our conclusion: search engines can only provide “relevant” results if they take into account more than just semantic similarity and keyword matches. They need to be able to incorporate other pieces of analytical data that actually correlate better with how spot-on the search results are, like whether users clicked on or even bought items from search results. The ideal scenario would be to define a function that takes all those signals into account and returns exactly how relevant a particular result is for a particular query and use that function to train an AI-powered search engine. The good thing is, you and I are not the first to have this idea. This is called a Learning To Rank algorithm (LTR), and these principles are the foundation of our work here.

Approaches to Learning to Rank

So you have this function that takes a query and some input and outputs how relevant that input is to the query. But this leaves some room for creativity in the implementation. For example, how many inputs are we ranking in one function call? The different answers to this question lead to the three main types of Learning to Rank training functions, which define the ground truth that the LTR algorithm will seek to mimic:

  1. Scoring one document at a time — this is called a pointwise approach. This method has us evaluate each document independently. Our function might look something like this (assuming Record is a previously defined type):

    def pointwiseLTR (query:Record, document:Record) -> int:
    	"""Return the precise relevance score of a given document in response to a given query."""
    	...
    	return relevanceScore
    

    The ... might be filled in with code calling some other classification or regression AI model, which computes the relevanceScore using the AI model outputs (like the model’s confidence that the query matches the document). By using an AI model, we don’t have to maintain a database that precisely maps every new entry to our search index to a bunch of queries that should surface it. The LTR AI we’re training will try to extrapolate the entire list’s rankings based on each individual score, and then adjust itself over and over using a process called backpropagation until our Learning to Rank AI’s predictions match the output of our training function well enough.

  2. Comparing two documents at a time — this is called a pairwise approach. This method involves returning a integer, usually -1 or 1 depending on which of the two input documents matches the query better.

    def pairwiseLTR (query:str, document1:Record, document2:Record) -> int:
    	"""Return a -1 if document1 matches the query better, and 1 if document2 matches the query better."""
    	...
    	return matchScore
    

    If you’re familiar with JavaScript, you might recognize this pattern from the Array.sort function. RankNet — one of the most famous Learning to Rank algorithms — pioneered this approach. It extrapolates the entire list’s rankings by moving individual results up and down the list based on these comparisons, and then trains itself using backpropagation.

  3. Sorting all the documents at once — this is called a listwise approach. Unlike the other approaches, where our “ground truth” function might incorporate some classification or regression AI, this method takes the entire list of documents and just returns the list from some database. This works well when you already happen to have a dataset that should correlate almost exactly with relevance, so you can just sort it and use that as training data. After training, the LTR algorithm should end up extra precise because there is no intermediate AI introducing some variability in the training data.

Why Learning to Rank gives search superpowers

Learning to Rank is the essential step that makes modern, meaningful search possible. Why? Because it can incorporate more sophisticated signals into the definition of relevance. Consider just a few:

The role of user behavior

Search rarely exists in isolation. Almost always, its possible to track how users interact with search results and use that data to further train our Learning to Rank AI. This is difficult to build yourself, but we’ve put a lot of effort into making it easy to implement if you’re already using Algolia’s search engine. Sending an “event”, as we call it, to Algolia when a user clicks on a search result is as simple as pasting a few lines of code into your frontend. In addition to improving the search engine itself, this also enables other features like recommendations and personalization.

Let’s take this a step further though: after a user clicks on a search result, can we tell how much they liked what they saw? Remember our earlier conclusion: quantifying relevance means figuring out if the user actually liked the results. If we keep track on the frontend of an ecommerce site the search results that the user has clicked on, we can also send events if they add that product to their cart or actually buy the product. Surely, if somebody bought something from a search result, they’re giving strong testimony that this particular search result was exceptionally relevant.

Those events need to be specifically submitted by the frontend, since Algolia would otherwise have no way to know what products those customers are ordering. However, other user behavior can be determined just from search requests. For example, what if a user runs the same search several times in a row? Or what if the search results are paginated and the user has flipped through to page three? This might indicate that the results we’re showing aren’t all that relevant to the query. Perhaps if it happens only for a few queries, this might just mean there are gaps in the product catalog. But if this happens regularly for various queries, we could include in our Learning to Rank “ground truth” function some math to demote results that seem to be routinely passed over by users.

Keeping track of user behavior also enables a Learning to Rank algorithm to prioritize trending results. At Algolia, we call this Multi-Signal Ranking. What might have been relevant yesterday might not be relevant tomorrow. Famously, the expected search results for the query “face masks” would have returned skincare products… until about early 2020. In our “ground truth” function, we can pick out trends in user behavior and boost results that match those trends. When feedback from user interactions is folded back into the ranking algorithm, users feel more positively about the results they’re being shown, which makes those searches accomplish their purpose better (especially if the purpose is to generate revenue as in ecommerce sites).

The role of context

General trends among user behavior counts as context for sure. But in an ideal world, we wouldn’t only customize search rankings to trends among your existing user base. Why? Well, you’ve already got those customers! The rest of your target audience — the part that doesn’t shop on your site yet — is less likely to put up with friction when they search for trendy products on your site. So it might make sense as an additional feature for ecommerce sites to include some analysis on trends in the wider Internet. With Algolia, you can include this data in product records themselves in your search index and use the Custom Ranking feature to include it in the ranking algorithm. You can also make the algorithm consider bits of data that are especially relevant to that particular style of site. For example, the number of comments, likes, and shares might be especially relevant on a blog site, since that could correlate to high engagement and therefore higher user satisfaction. But on an ecommerce site, you have data points that blogs would never care about, like brand partnerships, rating numbers, and sales.

In many applications, records are tied to geographical data. For example, a search index of local businesses naturally would include coordinates describing their storefronts. That context is invaluable to the ranking algorithm, since clearly users are going to want results close by, therefore a good Learning to Rank algorithm is specifically going to take geolocation into account. Algolia actually has that built-in.

Another useful piece of context is the topic we’re searching in. For example, an online electronics storefront might have Apple products. Naturally, users might include the word “apple” in their search queries, and they should get iPhones and MacBooks as results. However, imagine that product image captions are a searchable attribute in that search index, and an unrelated television is advertising their picture quality with an image of the TV displaying the stereotypical basket of fruit. The caption includes the word for apple. Should that TV show up in the search results for “apple tv”? No, the user is probably looking for this. A great implementation of Learning to Rank should also consider what context we’re searching in. Narrowing the scope further, it could also predict not only the site’s general topic, but also what specific category of products we should be searching in given a predefined hierarchy of product categories. Then, it could boost the search results that match the predicted category, or even filter out the non-matches all together depending on its confidence. We call this Query Categorization.

Learning to Rank — how modern search is made

Learning to Rank is a shift in how we think about search: not as a static retrieval task, but as a dynamic process that adapts to users, context, and behavior. In addition to matching keywords and vector similarity scores, LTR incorporates more sophisticated signals like clicks, conversions, and trends — signals that actually highlight what users want to see. By learning from actual user interactions and adapting to what people find useful, Learning to Rank enables search systems to feel more intuitive, responsive, and ultimately, more human.

With Algolia, you don’t need to be a machine learning expert to start reaping the benefits. We’ve wrapped all these features into a suite of tools headed by AI Ranking. If you’d like to learn why Algolia’s is the industry’s best all-around solution yet, check out this article on our blog.

Recommended

We think you might be interested in these:

Get the AI search that shows users what they need