Search by Algolia

Sorry, there is no results for this query

facebookfacebooklinkedinlinkedintwittertwittermailmail

AI ranking refers to a variety of machine learning algorithms used to optimize the order of search results. 

With any search query, there could be many relevant results. That’s where search result ranking comes into play. Increasingly, ranking is powered by artificial intelligence (AI), which includes a variety of machine learning algorithms. 

In this article, I will describe some of the different approaches to AI ranking and briefly discuss some of the challenges for delivering better results. 

Precision vs recall

To improve search result ranking, you need to be able to measure it. Two simple measures of relevance are precision and recall

  • Precision is the percentage of retrieved documents that are relevant. Ideally, search results contain only relevant items, but this is not always the case. For example, a search for “iphone” might return iPhone cases, when in fact the customer is looking for a phone. 
  • Recall is the percentage of all relevant documents that are retrieved. Ideally, every record relevant for the query appears in the results, but some relevant document may not be found. For example, a record containing “NYC” for the query “New York”.

These are useful metrics for helping us to determine if the results are any good. Ideally, recall and precision would both score 100%, but in practice that’s very difficult. Moreover, relevance can be subjective! 

To illustrate, let’s say it’s time to buy a new TV, so you search your favorite seller’s site for “big tv.” You get 119 results. Amongst the results, some items are quite relevant. But some are not (for example, the TV stands and cabinet below). This is precision. There were many other relevant products on the site that were not included in the results. We call this recall.

search recall example

To get better precision and recall, you might refine the search query. You might search “ TVs over 40” ”, and this time (see below) you get 262 results! It looks like there are more TV sets in the new results, but only some of them actually satisfy the query parameter of over 40 inches. 

search precision example

In fact, there can be a yin-yang between precision and recall; improving one may have a detrimental effect on the other. 

In the example above, there can be many relevant TVs. However, the best result is the one that customers are most interested in, which can be determined by clicks, purchases, ratings, lowest returns, etc. The top ranked results get the highest engagement; conversely, when people don’t easily find the best results, they’re likely to abandon your site or find another way to get their questions answered, such as opening a support ticket. 

Precision and recall can help us conceptually begin to wrap our heads around AI ranking. When a query is performed, search engines must determine if something is relevant, and then rank them in an order from best to worst. 

Traditionally, there were a variety of statistical methods used to rank results based on term frequency within a set of documents. Keyword search engines look for words of the query and their alternatives, and then the algorithm can be used to rank those results by most to least relevant. These methods are very efficient and fast, but require many additional heuristics. Some of the added capabilities to improve purely statistical ranking models include:

  • Synonym libraries with more alternatives for the initial query terms (e.g., “tv” or “television” or “tv set”)
  • Rules to handle certain types of queries (e.g., (if query contains “above NUMBER” then transform in a filter on the size, such as “TVs above 27 inches”)
  • Query process techniques such as typo tolerance, categorization, stemming, and more
  • Engagement metrics (such as clicks, popularity, and conversions)

These “signals” can provide a nice feedback loop with results. As I’ll show, higher engagement means results can be re-ranked accordingly, which in turn can generate more engagement. There are a variety of machine learning methods that work well, which I’ll describe below. 

Learning-to-rank 

Learning-to-rank (LTR) is a type of machine learning that improves ranking and assists with precision. It includes supervised, unsupervised, and reinforcement learning. There are also variations like semi-supervised learning. Each of these solutions offers AI ranking capabilities to deliver improved results over more simpler statistical methods. 

Supervised learning, for example, is a kind of machine learning that uses datasets, or labels, to classify and predict search results. In the “big tv” example, you would need to label a subset of results as “big tv” for the algorithm to know what people are searching for. The problem with this is that it suffers from cognitive bias —what constitutes a big TV? Everyone who trains this kind of algorithm will have a different opinion, and it will change over time with new products. What if the TV doesn’t have a screen, such as projection TVs? How would you label it?  

Unsupervised learning is designed to partially remove the human bias of supervised learning and instead let machine learning do the optimization at an individual query level. Typically this means computing the results with a static human-configured approach and then re-ranking the top X results with a machine learning model.

This approach is pretty smart and has been adopted by some of the major open source search technologies to produce great outcomes. However, there are downsides.

  • It still requires a lot of engineering work to get up and running, and 
  • The query-result pair index scores are fixed or at best seldomly updated. This means the algorithm inputs remain static. It’s a step forward, but we can do even better by using reinforcement learning and dynamic re-ranking (more on this below).

In both cases, you can adjust rankings manually, but as a result, these rankings typically work well for a portion of queries, but fail for others. Fixing the failures without causing problems elsewhere is very hard (precision and recall!). A good analogy would be pulling a bunch of levers, the issue at top of mind improves, but it’s not immediately clear how this impacts everything else. Search is not a simple problem and people can’t balance this problem across thousands or even millions of different queries.

Reinforcement learning is another type of learn-to-rank that uses feedback to rank results based on positive outcomes. Reinforcement learning makes frequent, incremental improvements based on signals such as clicks, sales, conversions, signups, and other positive feedback and can be looped back into the system. 

In the search world a positive rating can mean different things, a search result was clicked, or led to a later event such as a sale, etc. At Algolia, we mostly focus on clicks as there is significantly more available data (faster to get to higher confidence), but we also use later events if there is sufficient data.

When using clicks to determine ratings you need to correct for position bias, short clicks (dissatisfaction with the clicked result), and various other factors. The positive ratings then roughly correlate to results clicked more frequently and negative ratings to those less frequently clicked. The confidence interval helps to correct for the sample size by calculating a probability distribution.

There have been many learning-to-rank algorithms that have been developed. At Algolia we offer a Dynamic Re-Ranking solution, a type of reinforcement learning. Dynamic Re-Ranking kicks in after the engine has computed your results’ textual relevance and has applied your custom ranking. It can be used to find trends in your users’ behavior. It can not only boost better results, but also demote results that aren’t relevant or converting. For example, two websites can have the exact same catalog or search index, but rank search results differently based on how customers are responding to the content. 

Personalization, merchandising, and more

There are many other factors for ranking I haven’t even touched on. For example, personalization. Using geo, past search history, past purchase history, and other factors, you can improve search result precision. Personalization is one of the most powerful tools for improving customer experience — when it’s done right. When it’s not done well — and it’s very complex to do — it can ruin the experience!

There’s often a need to “override” organic search ranking based on short-term priorities. For example, you may want to prioritize certain results for a new product launch, product promotion, or news announcement. When you add new products or pages to your site, it won’t have any clicks or conversion events yet — something reinforcement learning algorithms need for dynamic ranking. This is why search engines offer additional features, such as custom ranking, merchandising, rules, and “pinning” to give customers control over the results for any given time. 

Dirty data and poor ranking

Machine learning is terrific for improving information retrieval and ranking. However, as the adage goes, bad data in, bad data out. Machine learning models are only as good as the data they have to work with. As an article on HBR succinctly stated, “Poor data quality is enemy number one to the widespread, profitable use of machine learning.”

Reinforcement learning only works when you have enough data to learn from. The AI ranking will not work well for brand new websites, products, or pages, or sites with poor data quality. You need data — and good data — for ranking. 

Bad data can be a quality issue — misspellings, poor descriptions, etc. — and it can also be a bad signal to noise ratio. For instance, SEO tactics can wreak havoc with Google search results and ranking. SEO generates a lot of “noise” as site owners compete for higher rankings. Websites can have the same problem with their site search. 

bad data example
Bad data in, bad results out. The meme illustrates how even great AI algorithms can lead to poor decisions when they have bad data. Image via Reddit.

Your site also might have competing owners, poor metadata, and document description issues. Marketplaces are a good example of how complex it can be. Marketplaces have product data that they create themselves or source from sellers, plus additional user-submitted data for reviews, FAQs, ratings, and more. Additionally, product catalogs are constantly updated and changed. Results can get worse day-to-day for the same query.

Marketplaces are an extreme example, but they illustrate the difficulty of dealing with product ranking day in and day out. You don’t have to have a marketplace to experience the problems.  Improving your search index can include updating your HTML schema, adding Open Graph metadata, augmenting your search index, and more. 

The smart search revolution

Learning-to-rank, particularly reinforcement learning, is a powerful solution for improved ranking. It has its challenges, such as being able to deliver better results with little historical performance data, or explaining probabilistic optimizations, but nonetheless offers many advantages over traditional ranking algorithms. 

However, it’s not entirely black and white. Two-phase learning to rank solutions will leverage the existing ranking of the search engine (which can be simpler statistical methods like TF-IDF or BM25, or a more complex one like Algolia Tie-Breaking or a vector similarity), then one or more machine learning algorithms is employed to re-rank the top K results (also known as top-k document retrieval). 

AI ranking can also be combined with hybrid search (vector search and keyword search technologies) to deliver a potent 1-2 punch: incredible relevance and ranking. There’s a subtle interplay with hybrid search and AI ranking. Better initial hybrid search results will impact clicks, conversions, and other signals, which then affect dynamic re-ranking results. It’s a virtuous cycle of improved results.

These techniques are rare today, but they will become the standard in years to come as machine learning further influences data and the underlying storage structures.

About the author
Bharat Guruprakash

Chief Product Officer

linkedin

Recommended Articles

Powered byAlgolia Algolia Recommend

How Algolia uses AI to deliver smarter search
ai

Julien Lemoine

Co-founder & CTO at Algolia

What is search relevance?
product

Jon Silvers

Director, Thought Leadership Marketing

How to design your merchandising (and searchandising) stategies for success
e-commerce

Peter Villani

Sr. Staff Writer