Search by Algolia
How personalization boosts customer engagement
e-commerce

How personalization boosts customer engagement

You land on your favorite retailer’s website, where everything seems to be attractively arranged just for you. Your favorite ...

Jon Silvers

Director, Digital Marketing

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?
e-commerce

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?

There is such tremendous activity both on and off of retailer websites today that it would be impossible to make ...

Catherine Dee

Search and Discovery writer

8 ways to use merchandising data to boost your online store ROI
e-commerce

8 ways to use merchandising data to boost your online store ROI

New year, new goals. Sounds positive, but looking at your sales data, your revenue and profit aren’t so hot ...

John Stewart

VP, Corporate Communications and Brand

Algolia DocSearch + Astro Starlight
engineering

Algolia DocSearch + Astro Starlight

What is Astro Starlight? If you're building a documentation site, your content needs to be easy to write and ...

Jaden Baptista

Technical Writer

What role does AI play in recommendation systems and engines?
ai

What role does AI play in recommendation systems and engines?

You put that in your cart. How about this cool thing to go with it? You liked that? Here are ...

Catherine Dee

Search and Discovery writer

How AI can help improve your user experience
ux

How AI can help improve your user experience

They say you get one chance to make a great first impression. With visual design on ecommerce web pages, this ...

Jon Silvers

Director, Digital Marketing

Keeping your Algolia search index up to date
product

Keeping your Algolia search index up to date

When creating your initial Algolia index, you may seed the index with an initial set of data. This is convenient ...

Jaden Baptista

Technical Writer

Merchandising in the AI era
e-commerce

Merchandising in the AI era

For merchandisers, every website visit is an opportunity to promote products to potential buyers. In the era of AI, incorporating ...

Tariq Khan

Director of Content Marketing

Debunking the most common AI myths
ai

Debunking the most common AI myths

ARTIFICIAL INTELLIGENCE CAN’T BE TRUSTED, shouts the headline on your social media newsfeed. Is that really true, or is ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How AI can benefit the retail industry
ai

How AI can benefit the retail industry

Artificial intelligence is on a roll. It’s strengthening healthcare diagnostics, taking on office grunt work, helping banks combat fraud ...

Catherine Dee

Search and Discovery writer

How ecommerce AI is reshaping business
e-commerce

How ecommerce AI is reshaping business

Like other modern phenomena such as social media, artificial intelligence has landed on the ecommerce industry scene with a giant ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

AI-driven smart merchandising: what it is and why your ecommerce store needs it
ai

AI-driven smart merchandising: what it is and why your ecommerce store needs it

Do you dream of having your own personal online shopper? Someone familiar and fun who pops up every time you ...

Catherine Dee

Search and Discovery writer

NRF 2024: A cocktail of inspiration and innovation
e-commerce

NRF 2024: A cocktail of inspiration and innovation

Retail’s big show, NRF 2024, once again brought together a wide spectrum of practitioners focused on innovation and transformation ...

Reshma Iyer

Director of Product Marketing, Ecommerce

How AI-powered personalization is transforming the user and customer experience
ai

How AI-powered personalization is transforming the user and customer experience

In a world of so many overwhelming choices for consumers, how can you best engage with the shoppers who visit ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show
algolia

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show

Get ready for an exhilarating journey into the future of retail as Algolia takes center stage at the NRF Retail ...

John Stewart

VP Corporate Marketing

How to master personalization with AI
ai

How to master personalization with AI

Picture ecommerce in its early days: businesses were just beginning to discover the power of personalized marketing. They’d divide ...

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

5 best practices for nailing the ecommerce virtual assistant user experience
ai

5 best practices for nailing the ecommerce virtual assistant user experience

“Hello there, how can I help you today?”, asks the virtual shopping assistant in the lower right-hand corner ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Search is a very complex problem

Search is a complex problem that is hard to customize to a particular use case — even search experts need a lot of iteration to configure a search engine.

Search is composed of several steps that all require different configurations. It is usually split into three steps:

  • Step 1: Retrieve all hits relevant potentially to the query
  • Step 2: Rank all those potential results to make sure the best results are first
  • Step 3: Reranking or dynamic ranking where external signals are used. Typically we apply a learning-to-rank approach.

In most modern search engines, the first two steps are executed concurrently for performance reasons, but each has distinct benefits — first to expand the search for the largest possible result set (ie, optimize for recall) and then order results from most to least relevant (ie, optimize for precision).

Independently of the technology we use, both the retrieval and the ranking processes are difficult and contain a lot of parameters.

In this article, we explain where weights (also called boosts) are used as a solution to these challenges in search. Weights can feel like an intuitive solution, which is why they’ve been used for a long time. However, they can actually be dangerous and often counterproductive.

Challenges in the retrieval phase

The main challenge associated with the retrieval phase is to ensure that all the potentially relevant records are found. This challenge takes a different approach depending on the technology that is used.

  • In a keyword-based search engine, the retrieve phase challenge is to ensure that all the potential hits are scanned. The engine needs to identify all the approximations that may be relevant. In particular, the engine needs to identify all the synonyms, and all the typos including some complex ones like a concatenation or a split of two keywords, explore partial matches (only a subset of the query words are found), etc.
  • In a typical vector-based search engine using LLMs, the retrieval phase challenge is to keep an acceptable performance as all hits have a similarity score that is not zero with the query. To make it even more difficult, the scores for relevant vs irrelevant items can vary greatly per query! Most engines use a graph representation (HNSW being the most popular) to scan only the results that have a vector close to the query. Most engines are finding that bi-encoders produce only rough approximations. And, there is no one threshold score that works for relevance. Thus it is now commonplace to include a re-ranker as a merge step between keyword and vector hits, such as a cross encoder. Cross encoders retain more of the network attention relationships that make transformers work and can thus make much better decisions on relevance even when the hits are derived from keyword retrieval. However they need to process the query and hits together, so they are not efficient for retrieval (like bi-encoders).

Today, more and more search engines rely both on keyword search and semantic search, meaning that the two challenges need to be addressed.

Challenges in the ranking phase

In the ranking phase, the challenge is to merge all the signals together to have one final way to order the results. In particular, there are three categories of signals that are merged together in the ranking

  • Textual signals: the information coming from the search engine technology when comparing the query and the retrieved record. There are often parameters This can be the output of a textual score in a keyword engine like a BM25F score or the similarity score between the query vector and the text.
  • Business signals: the information that is business-related to help sort the result independently of the textual relevance between the query and the record. In particular, it can use a notion of the popularity of the item, availability of the item, items promoted for advertisement, contribution to the business objective, etc.
  • User signals: the information that is specific to the customer. It contains some notion of availability (what are the records that the user can access), notion of preference, geography, etc.

Mixing those criteria is very complex and the reflex is often to use weights set by the business to merge all those signals.

Weights inside the Textual Score

In the retrieve phase, the query information can be found inside multiple attributes (also called fields) of a record. Not all attributes have the same importance and this is why most engines ask the customer to set a score (or a boost) to every attribute to translate the idea of the business importance of an item. Setting those scores is not an easy task and often leads to relevance problems as there is an infinite number of configurations that are possible, a lot of these configurations producing very similar results.

From the business point of view, there is a difference between the value of each attribute. For example, in an ecommerce store, it is not the same to match the query “laptop 13” in the “category” attribute than in the “description” attribute (you can imagine this is not relevant to match a backpack that has a pocket for laptop on the “laptop” query).

The problem with setting the weights manually is that you never know if you have the best configuration or not, and the best configuration actually varies for different query types. So there is no one solution. You are limited by the number of different configurations you can test, each of them requiring a test period. Even worse, it is common to iterate on the weights based on the observation of a small set of queries without checking in advance the global impact of this change.

Instead of setting those weights manually, it is better to give a “hint” to an AI algorithm that will optimize the weight automatically and constantly (because the data are also changing over time). Such a hint is usually a list of attributes ordered from the most important to the least important with potentially some equality. For example, (“Category”, “Brand”, “Color”) > “Name” > “Description” when the business knows that “Category”, “Brand” are clean data reviewed by the business and Name/Description are more generic.

A weight to merge the Textual signals and the Business signals

Another common usage of weight is to merge the textual signals and the business signals (score = 𝝈 x TextualScore + (1 – 𝝈 ) * BusinessScore). For example, a weight of 0.5 will give the same importance to the textual score and to the business score and compute a final ordering. This seems very intuitive but creates a lot of relevancy issues. To illustrate the problem, let’s take an extreme example where you want to sort by a business signal. In this case, you totally ignore the textual signals to sort by one business criteria (sorting by increasing price for example). Because the search engine is designed to consider all potential hits, even the ones that are far away from the query, you will end up with results that are very far from the query because they are cheap. This is a problem you find on a lot of websites, often forcing the customer to filter the query to have something a bit more relevant.

The fundamental problem is that the business score is merged in the same way for all results, while some of them are very far from the query and should be eliminated. For the sort, one classical solution is to filter the result set and only keep the good textual scores before applying the sort. In practice, most marketplaces use such an approach. (you can detect such an approach if you’re on a site where the number of results for the “sort by” is lower than the initial search).

For the general problem of merging the Textual Signals and the Business Signals, we have exactly the same problem as in the extreme case of a “sort by”. To be appropriate, it requires to have several buckets of textual relevance and to then apply the sort inside each bucket.

Similar to the problem of weights inside the Textual Score, the best approach is to define the business score and let an algorithm optimize the merging for you depending on the user behavior.

Weights between different signals

Ranking is a complex merge problem between a lot of signals. It is tempting to merge all of them with weights, at least at a high level between the Textual Signals, Business Signals, and User Signals. In practice setting those weights manually is one order of magnitude more complex than the previous two examples and is a setup for failure.

Depending on the context (the query, the section of the website, etc.), the merge can be different to provide relevant results to the user. The search engine can potentially use weights and a machine learning algorithm to set them, this is fine. But setting those weights manually is never something that delivers the best results.

Weights are dangerous

Weights are more often part of the problem than part of the solution when they are set manually. The next time you will be asked to manually configure a weight, you should think about the danger of such a setting and if there is a way to automatically configure them. Noting that LTR (Learning To Rank) algorithms often use boosted trees and other non-linear ways of finding dependencies and relationships between the ranking signals. Thus the optimal way of setting the weights ends up different for basically every query. For the sort example above, this could learn that the relevance cut off should increase when a hard sort — like the price attribute — is applied.

If you want to learn how Algolia is tackling this challenge, watch this space to learn more.

About the author
Julien Lemoine

Co-founder & former CTO at Algolia

githublinkedintwitter

Recommended Articles

Powered byAlgolia Algolia Recommend

Algolia's top 10 tips to achieve highly relevant search results
product

Julien Lemoine

Co-founder & former CTO at Algolia

What is a search query and how is it processed by a search engine?
product

Catherine Dee

Search and Discovery writer

Comparing Algolia and Elasticsearch For Consumer-Grade Search Part 2: Relevance Isn’t Luck
engineering

Josh Dzielak