Search by Algolia
What is a B2B marketplace?
e-commerce

What is a B2B marketplace?

It’s no secret that B2B (business-to-business) transactions have largely migrated online. According to Gartner, by 2025, 80 ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

3 strategies for B2B ecommerce growth: key takeaways from B2B Online - Chicago
e-commerce

3 strategies for B2B ecommerce growth: key takeaways from B2B Online - Chicago

Twice a year, B2B Online brings together industry leaders to discuss the trends affecting the B2B ecommerce industry. At the ...

Elena Moravec

Director of Product Marketing & Strategy

Deconstructing smart digital merchandising
e-commerce

Deconstructing smart digital merchandising

This is Part 2 of a series that dives into the transformational journey made by digital merchandising to drive positive ...

Benoit Reulier
Reshma Iyer

Benoit Reulier &

Reshma Iyer

The death of traditional shopping: How AI-powered conversational commerce changes everything
ai

The death of traditional shopping: How AI-powered conversational commerce changes everything

Get ready for the ride: online shopping is about to be completely upended by AI. Over the past few years ...

Aayush Iyer

Director, User Experience & UI Platform

What is B2C ecommerce? Models, examples, and definitions
e-commerce

What is B2C ecommerce? Models, examples, and definitions

Remember life before online shopping? When you had to actually leave the house for a brick-and-mortar store to ...

Catherine Dee

Search and Discovery writer

What are marketplace platforms and software? Why are they important?
e-commerce

What are marketplace platforms and software? Why are they important?

If you imagine pushing a virtual shopping cart down the aisles of an online store, or browsing items in an ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is an online marketplace?
e-commerce

What is an online marketplace?

Remember the world before the convenience of online commerce? Before the pandemic, before the proliferation of ecommerce sites, when the ...

Catherine Dee

Search and Discovery writer

10 ways AI is transforming ecommerce
e-commerce

10 ways AI is transforming ecommerce

Artificial intelligence (AI) is no longer just the stuff of scary futuristic movies; it’s recently burst into the headlines ...

Catherine Dee

Search and Discovery writer

AI as a Service (AIaaS) in the era of "buy not build"
ai

AI as a Service (AIaaS) in the era of "buy not build"

Imagine you are the CTO of a company that has just undergone a massive decade long digital transformation. You’ve ...

Sean Mullaney

CTO @Algolia

By the numbers: the ROI of keyword and AI site search for digital commerce
product

By the numbers: the ROI of keyword and AI site search for digital commerce

Did you know that the tiny search bar at the top of many ecommerce sites can offer an outsized return ...

Jon Silvers

Director, Digital Marketing

Using pre-trained AI algorithms to solve the cold start problem
ai

Using pre-trained AI algorithms to solve the cold start problem

Artificial intelligence (AI) has quickly moved from hot topic to everyday life. Now, ecommerce businesses are beginning to clearly see ...

Etienne Martin

VP of Product

Introducing Algolia NeuralSearch
product

Introducing Algolia NeuralSearch

We couldn’t be more excited to announce the availability of our breakthrough product, Algolia NeuralSearch. The world has stepped ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

AI is eating ecommerce
ai

AI is eating ecommerce

The ecommerce industry has experienced steady and reliable growth over the last 20 years (albeit interrupted briefly by a global ...

Sean Mullaney

CTO @Algolia

Semantic textual similarity: a game changer for search results and recommendations
product

Semantic textual similarity: a game changer for search results and recommendations

As an ecommerce professional, you know the importance of providing a five-star search experience on your site or in ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is hashing and how does it improve website and app search?
ai

What is hashing and how does it improve website and app search?

Hashing.   Yep, you read that right.   Not hashtags. Not golden, crisp-on-the-outside, melty-on-the-inside hash browns ...

Catherine Dee

Search and Discovery writer

Conference Recap: ECIR23 Take-aways
engineering

Conference Recap: ECIR23 Take-aways

We’re just back from ECIR23, the leading European conference around Information Retrieval systems, which ran its 45th edition in ...

Paul-Louis Nech

Senior ML Engineer

What is a neural network and how many types are there?
ai

What is a neural network and how many types are there?

Your grandfather wears those comfy slipper-y shoes all day, every day, and they’re starting to get holes in ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

10 reasons AI search is the next big thing
ai

10 reasons AI search is the next big thing

Every time I look at the news, there is another article about the race to build new search and discovery ...

Michelle Adams

Chief Revenue Officer at Algolia

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Search can feel both simple and complicated at the same time. Searching on Google is simple, and the results are pretty relevant (although that can be argued in recent years with Google pushing a lot of sponsored results). Although it is not search, ChatGPT has made that experience even more magical. However, anyone who has implemented search knows that information retrieval is a highly complex topic.

To simplify, the life-cycle from a search query to results, search can be divided in three distinct processes: query understanding, retrieval, and ranking.

  1. Query understanding: Natural language processing (NLP) techniques prepare and structure the query for the search engine to analyze; 
  2. Retrieval: The search engine will then retrieve the most relevant results and rank them from most to least relevant;
  3. Ranking: Finally, there’s a re-ranking process to push the best results to the top (based on clicks, conversions, etc.) and apply a customer’s rules, personalization, and more. 

search pyramid

Machine learning AI has been applied to query processing and ranking for some time now, and it greatly improves both. The missing piece was retrieval, and retrieval is even more vital for improving the overall quality of results.

We can measure retrieval quality using precision and recall. Precision is the percentage of retrieved documents that are relevant, and recall is the percentage of all relevant documents that are retrieved. Both metrics can help us to determine if search results are any good. 

To illustrate, let’s say it’s time to upgrade your kitchenware, so you search your favorite seller’s site for “fry pan”. Amongst the results, some items are quite relevant. But, some are not — like the cookware sets with sauce pans (screenshot below). This is precision. However, there were many other relevant products on the site that were not included in the results. We call this recall.

precision vs recall

Now, let’s say you decide to refine the search query. You might search “non-stick frying pan” and this time (see below) (1) there is a different quantity of results for this query because we have introduced slightly different keywords and (2) there are more of the results that you were expecting (frying pans, not cookware sets). 

recall vs precision

In fact, there can be a yin-yang between precision and recall; improving the precision (accuracy) may impact the recall, and improving the recall (completeness) may hurt precision. The holy grail is improving both, and this is exactly what AI retrieval can do. 

Retrieval was the last piece of the AI search puzzle, and it was also the hardest for several reasons:

  • Managing AI retrieval scale and performance were cost prohibitive. The storage, CPUs, and algorithms all needed to be specialized. 
  • AI retrieval models were “brittle” — the search index was updated with new or changed content, the models would need to be updated.
  • There was a tradeoff between precise matching and broad concept matching.

In this post, I’ll explain how we’ve set out to solve this last challenge for AI retrieval. In future posts, I’ll speak more about the other parts of the search pyramid. 

AI information retrieval 

Search retrieval requires technologies to determine relevance for any particular query. For years, it was powered by keyword search engines. That’s changing. With the introduction of vector search, which goes beyond keyword search, concepts can be understood. 

Vector search is a machine learning technology for AI search. Vectors are a way to represent words mathematically. Vectors are plotted and clustered in multiple dimensions (also called n-dimensional space). Vector search compares the similarity of multiple objects to a search query or subject item via their vector representation. In order to find similar matches, the query (or the subject) is converted into vectors using the same model that is used to convert objects (i.e. data or content) into vectors. Vectors that are similar to one another are returned from the database, finding the closest matches, providing accurate results, while eliminating irrelevant results that traditional search technology might have returned.

n-dimensional vector space
An example of what vectors in an n-dimensional space might look like for the term “coffee”, visualized via Tensorflow image projector

Techniques such as HNSW (Hierarchical Navigable Small World), IVF (Inverted File), or PQ (Product Quantization, a technique to reduce the number of dimensions of a vector) are some of the most popular Approximate Nearest Neighbor (ANN) methods to find similarity between vectors. Each technique focuses on improving a particular performance property, such as memory reduction with PQ or fast but accurate search times with HNSW and IVF. It is common practice to mix several components to produce a ‘composite’ index to achieve optimal performance for a given use case.

There can be thousands of dimensions. The proximity and angle between each vector helps the search engine determine similarity between terms and meaning. Type in “espresso with milk thingy” and a vector search engine will look for similarity to return espresso makers with steam wands. However, type in one word, like “Delonghi” (an espresso maker brand), and a vector search engine is just as likely to return other brands and different machines — Nespresso, Keurig, KitchenAid, etc. That’s because vectors only understand the concept of a Delonghi. 

espresso with milk thingy

Unlike newer vector engines, traditional keyword search engines are fast and precise. Thus, a search for a “Delonghi Magnifica” will give exactly that. However, keyword search engines can struggle when the query doesn’t match the content in your search index. To address the problem, companies can add rules, synonyms, keyword tagging, or other workarounds, but it’s impossible to cover every use case. 

For example, you could write a rule that a query containing the keywords “coffee or espresso”, “machine”, and “milk” all mean the same thing as “espresso machine with steam wand.” The problem is that you won’t be able to cover every edge case for every possible long tail query. Take another example, a search for the word “java”. Java is used synonymously with coffee and espresso, but unless there’s a synonym or rule in the search engine, the query will fail. 

Designing and building an efficient index of vectors that can scale is a complex and expensive task. Similarly, building an efficient keyword search engine that works for long tail queries is equally daunting. Separately, vector and keyword search technologies are quite good. However, together, they’re terrific. 

This is called hybrid search, and it works well for exact matches, ambiguous short queries, and long tail queries. 

An AI search performance breakthrough

So, you just slap together some vectors and keyword technologies and you’re done, right? Hybrid search engine solved!

If only it were so easy. As you might imagine, there are a few challenges. The biggest issue is managing vector search scaling and costs. Vectors are basically floating point numbers. Computers struggle immensely with floating point numbers. This is why specialized computers like GPUs are being used for AI and vector manipulation. Not only do you need specialized databases for running vector search, you’ll need full-time development resources to continually manage production. With AI models, it is critical that the data that is fed to the models remains fresh, relevant, and optimized. For ecommerce and enterprise businesses where data is constantly being updated and speed matters, AI search is too computationally expensive to run in production. 

Some companies have attempted to go around the problem by running an AI query only if the keyword query fails. This helps to minimize processing costs, but fails to provide the best results for customers. 

The bottom line is that most companies want to spend their time and money focused on their business, not attending to search engine infrastructure. The solution is an approach we’ve pioneered. It’s called neural hashing. Hashing is a technique that allows us to compress vectors without losing information. We can turn complex 2000-decimal long numbers into a simple static length expression, which makes computing them very fast and cheap. Hashing is not a new concept within AI as applied to vectors. 

Locality-Sensitive Hashing (LSH) is a well-known algorithmic technique that hashes similar input items into the same “buckets” with high probability. Typically there are tradeoffs — higher or lower similarity — in how “buckets” are determined. With our neural hashing technique, we have eliminated the need for tradeoffs. As a result, we can compress, or hash, vectors with neural networks (thus the name neural hashing) to 1/10th their normal size, while still retaining up to 99% of the information. They can be stored and managed on standard hardware and databases. In fact, we can process hashed vectors, or binary vectors, up to 500 times faster than standard vector similarity, making it as fast to deliver as keyword search. And we can do it on regular CPUs.

Here’s an example of a long tail query for “non-teflon non-stick frypan” running on a keyword-only engine versus a hashing / keyword engine.

There are several important takeaways to glean from the screenshots above:

  • The hybrid engine offers both higher precision and higher recall. 
  • We are running both hashes and keywords in a single query, and the combined hybrid results are scored and ranked by relevance. 
  • This is running on commodity hardware and results are nearly instantaneous; the hybrid results aren’t any slower than keyword-only.

Note too that “frypan” is written as one word instead of two — it could have also been written “fry pan”, “frying pan”, or “skillet” — but the search engine doesn’t skip a beat. This touches on another important feature of AI-retrieval: it greatly reduces the manual workload associated with improving search relevance. Gone are the days of writing synonym libraries for common terms, or writing rules for certain types of queries. It also opens up entirely new possibilities such as offering Q&A search.

This new feature is available with Algolia NeuralSearch™. It is in private beta now, but you can sign up here to be notified when it’s available. 

Search is more than retrieval, of course. In our end-to-end AI search pyramid, retrieval (neural hashing) is in the middle. On each end of the AI pyramid is query understanding and ranking. In future blogs, I will touch on these other two capabilities.

Next steps

Neural hashing represents a breakthrough for putting AI retrieval into production for a huge variety of use cases. Combined with AI-powered query processing and re-ranking, it promises to unleash the full power of AI on-site search.

And now that we’ve released these new end-to-end AI capabilities with Algolia NeuralSearch, you can be among the first to try them out! Contact our team to find out how. 

About the author
Bharat Guruprakash

Chief Product Officer

linkedin

Recommended Articles

Powered byAlgolia Algolia Recommend

A simple guide to AI search
ai

Jon Silvers

Director, Digital Marketing

What is vector search?
ai

Dustin Coates

Product and GTM Manager

Scaling marketplace search with AI
ai

Bharat Guruprakash

Chief Product Officer