Search can feel both simple and complicated at the same time. Searching on Google is simple, and the results are pretty relevant (although that can be argued in recent years with Google pushing a lot of sponsored results). Although it is not search, ChatGPT has made that experience even more magical. However, anyone who has implemented search knows that information retrieval is a highly complex topic.
To simplify, the life-cycle from a search query to results, search can be divided in three distinct processes: query understanding, retrieval, and ranking.
Query understanding: Natural language processing (NLP) techniques prepare and structure the query for the search engine to analyze;
Retrieval: The search engine will then retrieve the most relevant results and rank them from most to least relevant;
Ranking: Finally, there’s a re-ranking process to push the best results to the top (based on clicks, conversions, etc.) and apply a customer’s rules, personalization, and more.
Machine learning AI has been applied to query processing and ranking for some time now, and it greatly improves both. The missing piece was retrieval, and retrieval is even more vital for improving the overall quality of results.
We can measure retrieval quality using precision and recall. Precision is the percentage of retrieved documents that are relevant, and recall is the percentage of all relevant documents that are retrieved. Both metrics can help us to determine if search results are any good.
To illustrate, let’s say it’s time to upgrade your kitchenware, so you search your favorite seller’s site for “fry pan”. Amongst the results, some items are quite relevant. But, some are not — like the cookware sets with sauce pans (screenshot below). This is precision. However, there were many other relevant products on the site that were not included in the results. We call this recall.
Now, let’s say you decide to refine the search query. You might search “non-stick frying pan” and this time (see below) (1) there is a different quantity of results for this query because we have introduced slightly different keywords and (2) there are more of the results that you were expecting (frying pans, not cookware sets).
In fact, there can be a yin-yang between precision and recall; improving the precision (accuracy) may impact the recall, and improving the recall (completeness) may hurt precision. The holy grail is improving both, and this is exactly what AI retrieval can do.
Retrieval was the last piece of the AI search puzzle, and it was also the hardest for several reasons:
Managing AI retrieval scale and performance were cost prohibitive. The storage, CPUs, and algorithms all needed to be specialized.
AI retrieval models were “brittle” — the search index was updated with new or changed content, the models would need to be updated.
There was a tradeoff between precise matching and broad concept matching.
In this post, I’ll explain how we’ve set out to solve this last challenge for AI retrieval. In future posts, I’ll speak more about the other parts of the search pyramid.
AI information retrieval
Search retrieval requires technologies to determine relevance for any particular query. For years, it was powered by keyword search engines. That’s changing. With the introduction of vector search, which goes beyond keyword search, concepts can be understood.
Vector search is a machine learning technology for AI search. Vectors are a way to represent words mathematically. Vectors are plotted and clustered in multiple dimensions (also called n-dimensional space). Vector search compares the similarity of multiple objects to a search query or subject item via their vector representation. In order to find similar matches, the query (or the subject) is converted into vectors using the same model that is used to convert objects (i.e. data or content) into vectors. Vectors that are similar to one another are returned from the database, finding the closest matches, providing accurate results, while eliminating irrelevant results that traditional search technology might have returned.
Techniques such as HNSW (Hierarchical Navigable Small World), IVF (Inverted File), or PQ (Product Quantization, a technique to reduce the number of dimensions of a vector) are some of the most popular Approximate Nearest Neighbor (ANN) methods to find similarity between vectors. Each technique focuses on improving a particular performance property, such as memory reduction with PQ or fast but accurate search times with HNSW and IVF. It is common practice to mix several components to produce a ‘composite’ index to achieve optimal performance for a given use case.
There can be thousands of dimensions. The proximity and angle between each vector helps the search engine determine similarity between terms and meaning. Type in “espresso with milk thingy” and a vector search engine will look for similarity to return espresso makers with steam wands. However, type in one word, like “Delonghi” (an espresso maker brand), and a vector search engine is just as likely to return other brands and different machines — Nespresso, Keurig, KitchenAid, etc. That’s because vectors only understand the concept of a Delonghi.
Unlike newer vector engines, traditional keyword search engines are fast and precise. Thus, a search for a “Delonghi Magnifica” will give exactly that. However, keyword search engines can struggle when the query doesn’t match the content in your search index. To address the problem, companies can add rules, synonyms, keyword tagging, or other workarounds, but it’s impossible to cover every use case.
For example, you could write a rule that a query containing the keywords “coffee or espresso”, “machine”, and “milk” all mean the same thing as “espresso machine with steam wand.” The problem is that you won’t be able to cover every edge case for every possible long tail query. Take another example, a search for the word “java”. Java is used synonymously with coffee and espresso, but unless there’s a synonym or rule in the search engine, the query will fail.
Designing and building an efficient index of vectors that can scale is a complex and expensive task. Similarly, building an efficient keyword search engine that works for long tail queries is equally daunting. Separately, vector and keyword search technologies are quite good. However, together, they’re terrific.
This is called hybrid search, and it works well for exact matches, ambiguous short queries, and long tail queries.
An AI search performance breakthrough
So, you just slap together some vectors and keyword technologies and you’re done, right? Hybrid search engine solved!
If only it were so easy. As you might imagine, there are a few challenges. The biggest issue is managing vector search scaling and costs. Vectors are basically floating point numbers. Computers struggle immensely with floating point numbers. This is why specialized computers like GPUs are being used for AI and vector manipulation. Not only do you need specialized databases for running vector search, you’ll need full-time development resources to continually manage production. With AI models, it is critical that the data that is fed to the models remains fresh, relevant, and optimized. For ecommerce and enterprise businesses where data is constantly being updated and speed matters, AI search is too computationally expensive to run in production.
Some companies have attempted to go around the problem by running an AI query only if the keyword query fails. This helps to minimize processing costs, but fails to provide the best results for customers.
The bottom line is that most companies want to spend their time and money focused on their business, not attending to search engine infrastructure. The solution is an approach we’ve pioneered. It’s called neural hashing. Hashing is a technique that allows us to compress vectors without losing information. We can turn complex 2000-decimal long numbers into a simple static length expression, which makes computing them very fast and cheap. Hashing is not a new concept within AI as applied to vectors.
Locality-Sensitive Hashing (LSH) is a well-known algorithmic technique that hashes similar input items into the same “buckets” with high probability. Typically there are tradeoffs — higher or lower similarity — in how “buckets” are determined. With our neural hashing technique, we have eliminated the need for tradeoffs. As a result, we can compress, or hash, vectors with neural networks (thus the name neural hashing) to 1/10th their normal size, while still retaining up to 99% of the information. They can be stored and managed on standard hardware and databases. In fact, we can process hashed vectors, or binary vectors, up to 500 times faster than standard vector similarity, making it as fast to deliver as keyword search. And we can do it on regular CPUs.
Here’s an example of a long tail query for “non-teflon non-stick frypan” running on a keyword-only engine versus a hashing / keyword engine.
There are several important takeaways to glean from the screenshots above:
The hybrid engine offers both higher precision and higher recall.
We are running both hashes and keywords in a single query, and the combined hybrid results are scored and ranked by relevance.
This is running on commodity hardware and results are nearly instantaneous; the hybrid results aren’t any slower than keyword-only.
Note too that “frypan” is written as one word instead of two — it could have also been written “fry pan”, “frying pan”, or “skillet” — but the search engine doesn’t skip a beat. This touches on another important feature of AI-retrieval: it greatly reduces the manual workload associated with improving search relevance. Gone are the days of writing synonym libraries for common terms, or writing rules for certain types of queries. It also opens up entirely new possibilities such as offering Q&A search.
This new feature is available with Algolia NeuralSearch™. It is in private beta now, but you can sign up here to be notified when it’s available.
Search is more than retrieval, of course. In our end-to-end AI search pyramid, retrieval (neural hashing) is in the middle. On each end of the AI pyramid is query understanding and ranking. In future blogs, I will touch on these other two capabilities.
Neural hashing represents a breakthrough for putting AI retrieval into production for a huge variety of use cases. Combined with AI-powered query processing and re-ranking, it promises to unleash the full power of AI on-site search.