Search by Algolia
What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?
e-commerce

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?

There is such tremendous activity both on and off of retailer websites today that it would be impossible to make ...

Catherine Dee

Search and Discovery writer

8 ways to use merchandising data to boost your online store ROI
e-commerce

8 ways to use merchandising data to boost your online store ROI

New year, new goals. Sounds positive, but looking at your sales data, your revenue and profit aren’t so hot ...

John Stewart

VP, Corporate Communications and Brand

Algolia DocSearch + Astro Starlight
engineering

Algolia DocSearch + Astro Starlight

What is Astro Starlight? If you're building a documentation site, your content needs to be easy to write and ...

Jaden Baptista

Technical Writer

What role does AI play in recommendation systems and engines?
ai

What role does AI play in recommendation systems and engines?

You put that in your cart. How about this cool thing to go with it? You liked that? Here are ...

Catherine Dee

Search and Discovery writer

How AI can help improve your user experience
ux

How AI can help improve your user experience

They say you get one chance to make a great first impression. With visual design on ecommerce web pages, this ...

Jon Silvers

Director, Digital Marketing

Keeping your Algolia search index up to date
product

Keeping your Algolia search index up to date

When creating your initial Algolia index, you may seed the index with an initial set of data. This is convenient ...

Jaden Baptista

Technical Writer

Merchandising in the AI era
e-commerce

Merchandising in the AI era

For merchandisers, every website visit is an opportunity to promote products to potential buyers. In the era of AI, incorporating ...

Tariq Khan

Director of Content Marketing

Debunking the most common AI myths
ai

Debunking the most common AI myths

ARTIFICIAL INTELLIGENCE CAN’T BE TRUSTED, shouts the headline on your social media newsfeed. Is that really true, or is ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How AI can benefit the retail industry
ai

How AI can benefit the retail industry

Artificial intelligence is on a roll. It’s strengthening healthcare diagnostics, taking on office grunt work, helping banks combat fraud ...

Catherine Dee

Search and Discovery writer

How ecommerce AI is reshaping business
e-commerce

How ecommerce AI is reshaping business

Like other modern phenomena such as social media, artificial intelligence has landed on the ecommerce industry scene with a giant ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

AI-driven smart merchandising: what it is and why your ecommerce store needs it
ai

AI-driven smart merchandising: what it is and why your ecommerce store needs it

Do you dream of having your own personal online shopper? Someone familiar and fun who pops up every time you ...

Catherine Dee

Search and Discovery writer

NRF 2024: A cocktail of inspiration and innovation
e-commerce

NRF 2024: A cocktail of inspiration and innovation

Retail’s big show, NRF 2024, once again brought together a wide spectrum of practitioners focused on innovation and transformation ...

Reshma Iyer

Director of Product Marketing, Ecommerce

How AI-powered personalization is transforming the user and customer experience
ai

How AI-powered personalization is transforming the user and customer experience

In a world of so many overwhelming choices for consumers, how can you best engage with the shoppers who visit ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show
algolia

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show

Get ready for an exhilarating journey into the future of retail as Algolia takes center stage at the NRF Retail ...

John Stewart

VP Corporate Marketing

How to master personalization with AI
ai

How to master personalization with AI

Picture ecommerce in its early days: businesses were just beginning to discover the power of personalized marketing. They’d divide ...

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

5 best practices for nailing the ecommerce virtual assistant user experience
ai

5 best practices for nailing the ecommerce virtual assistant user experience

“Hello there, how can I help you today?”, asks the virtual shopping assistant in the lower right-hand corner ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Best practices of conversion-focused ecommerce website design
e-commerce

Best practices of conversion-focused ecommerce website design

The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...

Catherine Dee

Search and Discovery writer

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Search can feel both simple and complicated at the same time. Searching on Google is simple, and the results are pretty relevant (although that can be argued in recent years with Google pushing a lot of sponsored results). Although it is not search, ChatGPT has made that experience even more magical. However, anyone who has implemented search knows that information retrieval is a highly complex topic.

To simplify, the life-cycle from a search query to results, search can be divided in three distinct processes: query understanding, retrieval, and ranking.

  1. Query understanding: Natural language processing (NLP) techniques prepare and structure the query for the search engine to analyze; 
  2. Retrieval: The search engine will then retrieve the most relevant results and rank them from most to least relevant;
  3. Ranking: Finally, there’s a re-ranking process to push the best results to the top (based on clicks, conversions, etc.) and apply a customer’s rules, personalization, and more. 

search pyramid

Machine learning AI has been applied to query processing and ranking for some time now, and it greatly improves both. The missing piece was retrieval, and retrieval is even more vital for improving the overall quality of results.

We can measure retrieval quality using precision and recall. Precision is the percentage of retrieved documents that are relevant, and recall is the percentage of all relevant documents that are retrieved. Both metrics can help us to determine if search results are any good. 

To illustrate, let’s say it’s time to upgrade your kitchenware, so you search your favorite seller’s site for “fry pan”. Amongst the results, some items are quite relevant. But, some are not — like the cookware sets with sauce pans (screenshot below). This is precision. However, there were many other relevant products on the site that were not included in the results. We call this recall.

precision vs recall

Now, let’s say you decide to refine the search query. You might search “non-stick frying pan” and this time (see below) (1) there is a different quantity of results for this query because we have introduced slightly different keywords and (2) there are more of the results that you were expecting (frying pans, not cookware sets). 

recall vs precision

In fact, there can be a yin-yang between precision and recall; improving the precision (accuracy) may impact the recall, and improving the recall (completeness) may hurt precision. The holy grail is improving both, and this is exactly what AI retrieval can do. 

Retrieval was the last piece of the AI search puzzle, and it was also the hardest for several reasons:

  • Managing AI retrieval scale and performance were cost prohibitive. The storage, CPUs, and algorithms all needed to be specialized. 
  • AI retrieval models were “brittle” — the search index was updated with new or changed content, the models would need to be updated.
  • There was a tradeoff between precise matching and broad concept matching.

In this post, I’ll explain how we’ve set out to solve this last challenge for AI retrieval. In future posts, I’ll speak more about the other parts of the search pyramid. 

AI information retrieval 

Search retrieval requires technologies to determine relevance for any particular query. For years, it was powered by keyword search engines. That’s changing. With the introduction of vector search, which goes beyond keyword search, concepts can be understood. 

Vector search is a machine learning technology for AI search. Vectors are a way to represent words mathematically. Vectors are plotted and clustered in multiple dimensions (also called n-dimensional space). Vector search compares the similarity of multiple objects to a search query or subject item via their vector representation. In order to find similar matches, the query (or the subject) is converted into vectors using the same model that is used to convert objects (i.e. data or content) into vectors. Vectors that are similar to one another are returned from the database, finding the closest matches, providing accurate results, while eliminating irrelevant results that traditional search technology might have returned.

n-dimensional vector space
An example of what vectors in an n-dimensional space might look like for the term “coffee”, visualized via Tensorflow image projector

Techniques such as HNSW (Hierarchical Navigable Small World), IVF (Inverted File), or PQ (Product Quantization, a technique to reduce the number of dimensions of a vector) are some of the most popular Approximate Nearest Neighbor (ANN) methods to find similarity between vectors. Each technique focuses on improving a particular performance property, such as memory reduction with PQ or fast but accurate search times with HNSW and IVF. It is common practice to mix several components to produce a ‘composite’ index to achieve optimal performance for a given use case.

There can be thousands of dimensions. The proximity and angle between each vector helps the search engine determine similarity between terms and meaning. Type in “espresso with milk thingy” and a vector search engine will look for similarity to return espresso makers with steam wands. However, type in one word, like “Delonghi” (an espresso maker brand), and a vector search engine is just as likely to return other brands and different machines — Nespresso, Keurig, KitchenAid, etc. That’s because vectors only understand the concept of a Delonghi. 

espresso with milk thingy

Unlike newer vector engines, traditional keyword search engines are fast and precise. Thus, a search for a “Delonghi Magnifica” will give exactly that. However, keyword search engines can struggle when the query doesn’t match the content in your search index. To address the problem, companies can add rules, synonyms, keyword tagging, or other workarounds, but it’s impossible to cover every use case. 

For example, you could write a rule that a query containing the keywords “coffee or espresso”, “machine”, and “milk” all mean the same thing as “espresso machine with steam wand.” The problem is that you won’t be able to cover every edge case for every possible long tail query. Take another example, a search for the word “java”. Java is used synonymously with coffee and espresso, but unless there’s a synonym or rule in the search engine, the query will fail. 

Designing and building an efficient index of vectors that can scale is a complex and expensive task. Similarly, building an efficient keyword search engine that works for long tail queries is equally daunting. Separately, vector and keyword search technologies are quite good. However, together, they’re terrific. 

This is called hybrid search, and it works well for exact matches, ambiguous short queries, and long tail queries. 

An AI search performance breakthrough

So, you just slap together some vectors and keyword technologies and you’re done, right? Hybrid search engine solved!

If only it were so easy. As you might imagine, there are a few challenges. The biggest issue is managing vector search scaling and costs. Vectors are basically floating point numbers. Computers struggle immensely with floating point numbers. This is why specialized computers like GPUs are being used for AI and vector manipulation. Not only do you need specialized databases for running vector search, you’ll need full-time development resources to continually manage production. With AI models, it is critical that the data that is fed to the models remains fresh, relevant, and optimized. For ecommerce and enterprise businesses where data is constantly being updated and speed matters, AI search is too computationally expensive to run in production. 

Some companies have attempted to go around the problem by running an AI query only if the keyword query fails. This helps to minimize processing costs, but fails to provide the best results for customers. 

The bottom line is that most companies want to spend their time and money focused on their business, not attending to search engine infrastructure. The solution is an approach we’ve pioneered. It’s called neural hashing. Hashing is a technique that allows us to compress vectors without losing information. We can turn complex 2000-decimal long numbers into a simple static length expression, which makes computing them very fast and cheap. Hashing is not a new concept within AI as applied to vectors. 

Locality-Sensitive Hashing (LSH) is a well-known algorithmic technique that hashes similar input items into the same “buckets” with high probability. Typically there are tradeoffs — higher or lower similarity — in how “buckets” are determined. With our neural hashing technique, we have eliminated the need for tradeoffs. As a result, we can compress, or hash, vectors with neural networks (thus the name neural hashing) to 1/10th their normal size, while still retaining up to 99% of the information. They can be stored and managed on standard hardware and databases. In fact, we can process hashed vectors, or binary vectors, up to 500 times faster than standard vector similarity, making it as fast to deliver as keyword search. And we can do it on regular CPUs.

Here’s an example of a long tail query for “non-teflon non-stick frypan” running on a keyword-only engine versus a hashing / keyword engine.

There are several important takeaways to glean from the screenshots above:

  • The hybrid engine offers both higher precision and higher recall. 
  • We are running both hashes and keywords in a single query, and the combined hybrid results are scored and ranked by relevance. 
  • This is running on commodity hardware and results are nearly instantaneous; the hybrid results aren’t any slower than keyword-only.

Note too that “frypan” is written as one word instead of two — it could have also been written “fry pan”, “frying pan”, or “skillet” — but the search engine doesn’t skip a beat. This touches on another important feature of AI-retrieval: it greatly reduces the manual workload associated with improving search relevance. Gone are the days of writing synonym libraries for common terms, or writing rules for certain types of queries. It also opens up entirely new possibilities such as offering Q&A search.

This new feature is available with Algolia NeuralSearch™. It is in private beta now, but you can sign up here to be notified when it’s available. 

Search is more than retrieval, of course. In our end-to-end AI search pyramid, retrieval (neural hashing) is in the middle. On each end of the AI pyramid is query understanding and ranking. In future blogs, I will touch on these other two capabilities.

Next steps

Neural hashing represents a breakthrough for putting AI retrieval into production for a huge variety of use cases. Combined with AI-powered query processing and re-ranking, it promises to unleash the full power of AI on-site search.

And now that we’ve released these new end-to-end AI capabilities with Algolia NeuralSearch, you can be among the first to try them out! Contact our team to find out how. 

About the author
Bharat Guruprakash

Chief Product Officer

linkedin

Recommended Articles

Powered byAlgolia Algolia Recommend

What is end-to-end AI search?
ai

Abhijit Mehta

Director of Product Management

A simple guide to AI search
ai

Jon Silvers

Director, Digital Marketing

What is vector search?
ai

Dustin Coates

Product and GTM Manager