Search by Algolia
How personalization boosts customer engagement
e-commerce

How personalization boosts customer engagement

You land on your favorite retailer’s website, where everything seems to be attractively arranged just for you. Your favorite ...

Jon Silvers

Director, Digital Marketing

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?
e-commerce

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?

There is such tremendous activity both on and off of retailer websites today that it would be impossible to make ...

Catherine Dee

Search and Discovery writer

8 ways to use merchandising data to boost your online store ROI
e-commerce

8 ways to use merchandising data to boost your online store ROI

New year, new goals. Sounds positive, but looking at your sales data, your revenue and profit aren’t so hot ...

John Stewart

VP, Corporate Communications and Brand

Algolia DocSearch + Astro Starlight
engineering

Algolia DocSearch + Astro Starlight

What is Astro Starlight? If you're building a documentation site, your content needs to be easy to write and ...

Jaden Baptista

Technical Writer

What role does AI play in recommendation systems and engines?
ai

What role does AI play in recommendation systems and engines?

You put that in your cart. How about this cool thing to go with it? You liked that? Here are ...

Catherine Dee

Search and Discovery writer

How AI can help improve your user experience
ux

How AI can help improve your user experience

They say you get one chance to make a great first impression. With visual design on ecommerce web pages, this ...

Jon Silvers

Director, Digital Marketing

Keeping your Algolia search index up to date
product

Keeping your Algolia search index up to date

When creating your initial Algolia index, you may seed the index with an initial set of data. This is convenient ...

Jaden Baptista

Technical Writer

Merchandising in the AI era
e-commerce

Merchandising in the AI era

For merchandisers, every website visit is an opportunity to promote products to potential buyers. In the era of AI, incorporating ...

Tariq Khan

Director of Content Marketing

Debunking the most common AI myths
ai

Debunking the most common AI myths

ARTIFICIAL INTELLIGENCE CAN’T BE TRUSTED, shouts the headline on your social media newsfeed. Is that really true, or is ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How AI can benefit the retail industry
ai

How AI can benefit the retail industry

Artificial intelligence is on a roll. It’s strengthening healthcare diagnostics, taking on office grunt work, helping banks combat fraud ...

Catherine Dee

Search and Discovery writer

How ecommerce AI is reshaping business
e-commerce

How ecommerce AI is reshaping business

Like other modern phenomena such as social media, artificial intelligence has landed on the ecommerce industry scene with a giant ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

AI-driven smart merchandising: what it is and why your ecommerce store needs it
ai

AI-driven smart merchandising: what it is and why your ecommerce store needs it

Do you dream of having your own personal online shopper? Someone familiar and fun who pops up every time you ...

Catherine Dee

Search and Discovery writer

NRF 2024: A cocktail of inspiration and innovation
e-commerce

NRF 2024: A cocktail of inspiration and innovation

Retail’s big show, NRF 2024, once again brought together a wide spectrum of practitioners focused on innovation and transformation ...

Reshma Iyer

Director of Product Marketing, Ecommerce

How AI-powered personalization is transforming the user and customer experience
ai

How AI-powered personalization is transforming the user and customer experience

In a world of so many overwhelming choices for consumers, how can you best engage with the shoppers who visit ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show
algolia

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show

Get ready for an exhilarating journey into the future of retail as Algolia takes center stage at the NRF Retail ...

John Stewart

VP Corporate Marketing

How to master personalization with AI
ai

How to master personalization with AI

Picture ecommerce in its early days: businesses were just beginning to discover the power of personalized marketing. They’d divide ...

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

5 best practices for nailing the ecommerce virtual assistant user experience
ai

5 best practices for nailing the ecommerce virtual assistant user experience

“Hello there, how can I help you today?”, asks the virtual shopping assistant in the lower right-hand corner ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

AI ranking refers to a variety of machine learning algorithms used to optimize the order of search results. 

With any search query, there could be many relevant results. That’s where search result ranking comes into play. Increasingly, ranking is powered by artificial intelligence (AI), which includes a variety of machine learning algorithms. 

In this article, I will describe some of the different approaches to AI ranking and briefly discuss some of the challenges for delivering better results. 

Precision vs recall

To improve search result ranking, you need to be able to measure it. Two simple measures of relevance are precision and recall

  • Precision is the percentage of retrieved documents that are relevant. Ideally, search results contain only relevant items, but this is not always the case. For example, a search for “iphone” might return iPhone cases, when in fact the customer is looking for a phone. 
  • Recall is the percentage of all relevant documents that are retrieved. Ideally, every record relevant for the query appears in the results, but some relevant document may not be found. For example, a record containing “NYC” for the query “New York”.

These are useful metrics for helping us to determine if the results are any good. Ideally, recall and precision would both score 100%, but in practice that’s very difficult. Moreover, relevance can be subjective! 

To illustrate, let’s say it’s time to buy a new TV, so you search your favorite seller’s site for “big tv.” You get 119 results. Amongst the results, some items are quite relevant. But some are not (for example, the TV stands and cabinet below). This is precision. There were many other relevant products on the site that were not included in the results. We call this recall.

search recall example

To get better precision and recall, you might refine the search query. You might search “ TVs over 40” ”, and this time (see below) you get 262 results! It looks like there are more TV sets in the new results, but only some of them actually satisfy the query parameter of over 40 inches. 

search precision example

In fact, there can be a yin-yang between precision and recall; improving one may have a detrimental effect on the other. 

In the example above, there can be many relevant TVs. However, the best result is the one that customers are most interested in, which can be determined by clicks, purchases, ratings, lowest returns, etc. The top ranked results get the highest engagement; conversely, when people don’t easily find the best results, they’re likely to abandon your site or find another way to get their questions answered, such as opening a support ticket. 

Precision and recall can help us conceptually begin to wrap our heads around AI ranking. When a query is performed, search engines must determine if something is relevant, and then rank them in an order from best to worst. 

Traditionally, there were a variety of statistical methods used to rank results based on term frequency within a set of documents. Keyword search engines look for words of the query and their alternatives, and then the algorithm can be used to rank those results by most to least relevant. These methods are very efficient and fast, but require many additional heuristics. Some of the added capabilities to improve purely statistical ranking models include:

  • Synonym libraries with more alternatives for the initial query terms (e.g., “tv” or “television” or “tv set”)
  • Rules to handle certain types of queries (e.g., (if query contains “above NUMBER” then transform in a filter on the size, such as “TVs above 27 inches”)
  • Query process techniques such as typo tolerance, categorization, stemming, and more
  • Engagement metrics (such as clicks, popularity, and conversions)

These “signals” can provide a nice feedback loop with results. As I’ll show, higher engagement means results can be re-ranked accordingly, which in turn can generate more engagement. There are a variety of machine learning methods that work well, which I’ll describe below. 

Learning-to-rank 

Learning-to-rank (LTR) is a type of machine learning that improves ranking and assists with precision. It includes supervised, unsupervised, and reinforcement learning. There are also variations like semi-supervised learning. Each of these solutions offers AI ranking capabilities to deliver improved results over more simpler statistical methods. 

Supervised learning, for example, is a kind of machine learning that uses datasets, or labels, to classify and predict search results. In the “big tv” example, you would need to label a subset of results as “big tv” for the algorithm to know what people are searching for. The problem with this is that it suffers from cognitive bias —what constitutes a big TV? Everyone who trains this kind of algorithm will have a different opinion, and it will change over time with new products. What if the TV doesn’t have a screen, such as projection TVs? How would you label it?  

Unsupervised learning is designed to partially remove the human bias of supervised learning and instead let machine learning do the optimization at an individual query level. Typically this means computing the results with a static human-configured approach and then re-ranking the top X results with a machine learning model.

This approach is pretty smart and has been adopted by some of the major open source search technologies to produce great outcomes. However, there are downsides.

  • It still requires a lot of engineering work to get up and running, and 
  • The query-result pair index scores are fixed or at best seldomly updated. This means the algorithm inputs remain static. It’s a step forward, but we can do even better by using reinforcement learning and dynamic re-ranking (more on this below).

In both cases, you can adjust rankings manually, but as a result, these rankings typically work well for a portion of queries, but fail for others. Fixing the failures without causing problems elsewhere is very hard (precision and recall!). A good analogy would be pulling a bunch of levers, the issue at top of mind improves, but it’s not immediately clear how this impacts everything else. Search is not a simple problem and people can’t balance this problem across thousands or even millions of different queries.

Reinforcement learning is another type of learn-to-rank that uses feedback to rank results based on positive outcomes. Reinforcement learning makes frequent, incremental improvements based on signals such as clicks, sales, conversions, signups, and other positive feedback and can be looped back into the system. 

In the search world a positive rating can mean different things, a search result was clicked, or led to a later event such as a sale, etc. At Algolia, we mostly focus on clicks as there is significantly more available data (faster to get to higher confidence), but we also use later events if there is sufficient data.

When using clicks to determine ratings you need to correct for position bias, short clicks (dissatisfaction with the clicked result), and various other factors. The positive ratings then roughly correlate to results clicked more frequently and negative ratings to those less frequently clicked. The confidence interval helps to correct for the sample size by calculating a probability distribution.

There have been many learning-to-rank algorithms that have been developed. At Algolia we offer a Dynamic Re-Ranking solution, a type of reinforcement learning. Dynamic Re-Ranking kicks in after the engine has computed your results’ textual relevance and has applied your custom ranking. It can be used to find trends in your users’ behavior. It can not only boost better results, but also demote results that aren’t relevant or converting. For example, two websites can have the exact same catalog or search index, but rank search results differently based on how customers are responding to the content. 

understand ai search banner

Personalization, merchandising, and more

There are many other factors for ranking I haven’t even touched on. For example, personalization. Using geo, past search history, past purchase history, and other factors, you can improve search result precision. Personalization is one of the most powerful tools for improving customer experience — when it’s done right. When it’s not done well — and it’s very complex to do — it can ruin the experience!

There’s often a need to “override” organic search ranking based on short-term priorities. For example, you may want to prioritize certain results for a new product launch, product promotion, or news announcement. When you add new products or pages to your site, it won’t have any clicks or conversion events yet — something reinforcement learning algorithms need for dynamic ranking. This is why search engines offer additional features, such as custom ranking, merchandising, rules, and “pinning” to give customers control over the results for any given time. 

Dirty data and poor ranking

Machine learning is terrific for improving information retrieval and ranking. However, as the adage goes, bad data in, bad data out. Machine learning models are only as good as the data they have to work with. As an article on HBR succinctly stated, “Poor data quality is enemy number one to the widespread, profitable use of machine learning.”

Reinforcement learning only works when you have enough data to learn from. The AI ranking will not work well for brand new websites, products, or pages, or sites with poor data quality. You need data — and good data — for ranking. 

Bad data can be a quality issue — misspellings, poor descriptions, etc. — and it can also be a bad signal to noise ratio. For instance, SEO tactics can wreak havoc with Google search results and ranking. SEO generates a lot of “noise” as site owners compete for higher rankings. Websites can have the same problem with their site search. 

bad data example
Bad data in, bad results out. The meme illustrates how even great AI algorithms can lead to poor decisions when they have bad data. Image via Reddit.

Your site also might have competing owners, poor metadata, and document description issues. Marketplaces are a good example of how complex it can be. Marketplaces have product data that they create themselves or source from sellers, plus additional user-submitted data for reviews, FAQs, ratings, and more. Additionally, product catalogs are constantly updated and changed. Results can get worse day-to-day for the same query.

Marketplaces are an extreme example, but they illustrate the difficulty of dealing with product ranking day in and day out. You don’t have to have a marketplace to experience the problems.  Improving your search index can include updating your HTML schema, adding Open Graph metadata, augmenting your search index, and more. 

The smart search revolution

Learning-to-rank, particularly reinforcement learning, is a powerful solution for improved ranking. It has its challenges, such as being able to deliver better results with little historical performance data, or explaining probabilistic optimizations, but nonetheless offers many advantages over traditional ranking algorithms. 

However, it’s not entirely black and white. Two-phase learning to rank solutions will leverage the existing ranking of the search engine (which can be simpler statistical methods like TF-IDF or BM25, or a more complex one like Algolia Tie-Breaking or a vector similarity), then one or more machine learning algorithms is employed to re-rank the top K results (also known as top-k document retrieval). 

AI ranking can also be combined with hybrid search (vector search and keyword search technologies) to deliver a potent 1-2 punch: incredible relevance and ranking. There’s a subtle interplay with hybrid search and AI ranking. Better initial hybrid search results will impact clicks, conversions, and other signals, which then affect dynamic re-ranking results. It’s a virtuous cycle of improved results.

These techniques are rare today, but they will become the standard in years to come as machine learning further influences data and the underlying storage structures.

About the author
Bharat Guruprakash

Chief Product Officer

linkedin

Recommended Articles

Powered byAlgolia Algolia Recommend

What is end-to-end AI search?
ai

Abhijit Mehta

Director of Product Management

Scaling marketplace search with AI
ai

Bharat Guruprakash

Chief Product Officer

How Algolia uses AI to deliver smarter search
ai

Julien Lemoine

Co-founder & former CTO at Algolia