Site search is about to make a quantum leap. New AI-powered solutions have become so effective that you may wonder if search engines can read your mind. These search engines can decipher complex queries, identify relevant results, combine custom inputs, and rank results in an ideal order that maximizes clicks, on-site conversions, and customer happiness.
Smarter search can produce incredible returns, too. On average, 40% of online buyers use the site search bar — higher or lower depending on the ecommerce business. Visitors have low tolerance for slow sites, and there’s tremendous position bias in the top result. Buyers who can’t quickly find what they’re looking for will bounce from sites. AI search better understands user intent to help make content more easily discovered.
Many companies had claimed AI capabilities for years, but only now is true end-to-end AI possible. This is due to a new technology called Neural Hashing, which enables search to scale for any use case without the usual high costs and production overheads normally associated with rolling out AI centered features and platforms.
As an example, ChatGPT costs thousands of dollars per month to run, yet it has a response time and a level of functionality that would be unacceptable in the context of powering search and discovery.
Studies show that even 100 millisecond lags have massive repercussions on user experience and conversion rates, so fast search is essential.
Neural hashing has been the missing ingredient for enabling companies to productize AI search at scale without being cost prohibitive.
While AI search can buoy many industries and use cases — from enterprise search to government websites — this ebook will focus exclusively on ecommerce with Algolia CTO Sean Mullaney and other Algolia subject matter experts sharing where these capabilities are coming from, how AI search works, and what it means for retailers.
Keyword search has been around for a long time and works much like the index at the back of a book. A keyword search engine creates an index of all words across all documents and delivers results based on simple matching algorithms.
To improve search relevance and result ranking, search engines introduced word statistics, like TF-IDF and BM25. Statistical search looked at the inverse frequency of a word in a document (IDF) versus the term frequency of a word (TF) to determine its importance. For example, stop words like “the” “and” “or” show up frequently everywhere, whereas words like “toothbrush” or “water” show up less frequently — that is, they are more uncommon. Term frequency can be used as a proxy for how important, or relevant, the document is.
Frequency-based statistics was very rudimentary and relied on exact matches. Keyword search algorithms built with Lucene APIs still rely on these statistical formulas today across a wide range of applications — it’s extremely simple to implement and fast. However, to improve accuracy customers must create synonym libraries, add rules, use additional metadata or keywords, or do other kinds of workarounds.
Statistical ranking was useful, but not enough; there were too many use cases where the words did not precisely match the query. For example, singular vs plural terms, verb inflections (present vs past tense, present participle, etc.), agglutinative or compound languages, and so forth. This led to the development of Natural Language Processing (NLP) functions to help manage the complexity of languages. Some of these processes include:
Stemming
Stemming is the process of converting the words into their base forms by removing prefixes and suffixes. This reduces resource usage and improves computing capability. For example, “change” and “changing” are converted to a root form “chang”.
Lemmatization
Similar to stemming, lemmatization brings words into their base (or root) form. It does so by considering the context and morphological basis of each word. For example, “changed” is converted to “change” or “is” to “be”. An important thing to note is that both stemming and lemmatization are used to reduce words to their original formats, so most projects do one or the other.
Word segmentation
In English and many Latin-based languages, the space is a good approximation of a word divider (or word delimiter), although this concept has limits because of the variability in how each language combines and separates word parts. For example, many English compound nouns are variably written (ice box = ice-box = icebox). However, the space is not found in all written scripts, and, without it, word segmentation becomes a difficult problem. Languages that do not have a trivial word segmentation process include Chinese and Japanese, where sentences but not words are delimited; Thai and Lao, where phrases and sentences but not words are delimited; and Vietnamese, where syllables but not words are delimited.
Speech tagging
Speech tagging, also called parts of speech (PoS) tagging, is a way of classifying lists of words as nouns, verbs, adjectives, etc., to more accurately process a query. It looks at the relationship between words in a sentence to improve accuracy by more clearly “identifying the meaning” of the sentence.
Entity extraction
Entity extraction is another technique for NLP that has become particularly important for voice search. As the name might suggest, entity extraction is a way to identify different elements of a query — people, places, dates, frequencies, quantities, etc. — to help a machine “understand” the information it contains. Entity extraction is a very good solution for overcoming simple keyword search limitations, but, like ontologies and knowledge graphs discussed below, it only works on specific domains and queries.
Another method for developing a better, semantic understanding of a query was the use of ontologies and knowledge graphs. Knowledge graphs represent a relationship between different elements — concepts, objects, events. An ontology defines each of the elements and their properties.
Together, this semantic approach attempted to represent different concepts and the connections between them. Google, for example, used a knowledge graph to not only match the words in the search query, but also look for entities that the query described. It was a way to get around the limitations of keyword search.
However, in practice, a knowledge graph and ontology approach are very hard to scale or port to different subjects, and subjects get out of date quickly — sports teams, world leaders, or even product attributes. The knowledge graph and ontology you build for one domain won’t easily transfer to the next domain. While you can build highly robust solutions for one subject, it may fail completely for a different subject where a different area of expertise is needed. Only a few big companies including Google were able to develop a knowledge graph automatically. Most other companies had to build them manually.
Autocomplete is a very useful semantic search tool for effectively helping customers find results faster. The most popular example is Google who released autocomplete at the end of 2004.
Autocomplete is an approach that attempts to anticipate search terms to help customers enter their query. It also offers contextual suggestions, helps users avoid typos, and filters content based on user’s location or preferences. The suggestions are generated by a series of algorithms that rely on multiple machine learning and natural language processing algorithms and models to generate matches, starting with a simple prefix string to identify, match, and predict the outcome of the unfinished search query.
For autocomplete to work effectively, a search engine must have a lot of data to work with across all sessions and, additionally, it must also be able to anticipate search terms for each user based on their behavior, previous searches, geolocation, and other attributes.
Predictive autocomplete has now become an expected feature for any modern, competitive search engine.
Early keyword probability models like BM25 had built relevance using term frequency, as discussed above. AI ranking took a big step forward by incorporating user feedback to further identify relevancy. One example of this is reinforcement learning. The basic idea of reinforcement learning is quite simple: use feedback to reinforce (strengthen) positive outcomes. Instead of making large changes infrequently, reinforcement learning makes frequent incremental changes.
There are many upsides to this, such as continuously improving results and faster surfacing of other potential results. Additionally, poorly performing results tend to fall away quickly through rolling experimentation.
Like Autocomplete, reinforcement learning needs a lot of data to return meaningful results; it’s a poor solution unless coupled with significant historical performance data. Furthermore, reinforcement learning tends to be very good for search result ranking, but it does not help to identify records. It still relies on the keywords and linguistic resources to identify matching records.
This is where vectors come into play.
Vector representation of text is very old. Its theoretical roots go back to the 1950s, and there were several key advances over the decades. We’ve also seen great innovation starting in 2013: new models based on neural networks leveraging large training sets (in particular BERT in 2018 by Google) have set the standard for the industry.
At its most simple, it’s a way to find related objects that have similar characteristics. Matching is accomplished by machine learning models that detect semantic relationships between objects in an index. Vectors can have thousands of dimensions, but to simplify, we can visualize vectors with a three-dimensional diagram (above). Vector search can connect relationships between words, and similar vectors are clustered together. Words like “king,” “queen,” and “royalty” will cluster together, as will words like “run,” “trot,” and “canter.”
Almost any object can be embedded and vectorized — text, images, video, music, etc. Early vector models were using words as dimensions; every different word was a dimension and the value was the count of the word, which was overly simple. That changed with the advent of latent semantic analysis (LSA) and latent semantic indexing (LSI), which analyzed the relationship between documents and the terms they contain by reducing the number of dimensions.
Today, newer AI models powered by vector engines are able to quickly retrieve the information in a high dimensional space.
This has been a game changer. Newer vector-based solutions can now know that “snow,” “cold,” and “skiing” are related ideas. This advance has made some of the other technologies mentioned above — such as entity extraction, ontologies, knowledge graph, etc. — obsolete.
Online, consumers expect instant search results (Amazon and Google have both done studies at the negative outcomes of even 100 milliseconds lag on consumer behavior). You can speed up and scale vector delivery, but it’s expensive, and will never be equal in speed compared to keyword search.
Vectors also don’t provide the same quality of relevance as keyword search for some queries. Keyword search still works better than vectors on single word queries and exact brand match queries. Vectors tend to work better on multi-word queries, concept searches, questions, and other more complex query types.
For example, when you query for “Adidas” on a keyword engine, by default you will only see the Adidas brand. The default behavior in a vector engine would be to have all shoe brands for the “Adidas” query (e.g., Nike, Puma, Adidas, etc.) because they are all in the same conceptual space. Keyword search still provides better — and more explainable (and tunable) — results.
Hybrid search is a new method to combine a full-text keyword search engine and a vector search engine into a single API.
There is tremendous complexity in running both keyword and vector engines at the same time for the same query. Some companies have opted to go around the complexity by running these processes sequentially; they run a keyword search and then, if a certain relevance threshold isn’t met, run a vector search. There’s a lot of poor tradeoffs for doing this — speed, accuracy, and limited ability to train each model.
True hybrid search is different. By combining full-text keyword search and vector search into a single query, customers can get more accurate results fast. Of course, for vector search to work as fast as keyword search, it requires the search engine to scale in terms of performance without adding massive costs. For most vector engines today, this is not possible.
This is why neural hashing, a new technology that compresses vectors to 1/10 their size, offers a viable path forward.
We will cover how hashing works in Chapter 3 — but we’ve seen some incredible results that are:
With that brief introduction, we will now provide more in-depth detail on each stage of AI search – query understanding, retrieval, and ranking.
A search engine needs to “process” the language in a search bar before it can execute a query. The process could be as simple as comparing the query exactly as written to the content in the index. But classic keyword search is more advanced than that, because it involves tokenizing and normalizing the query into smaller pieces – i.e., words and keywords.
This process can be easy (where the words are separated by spaces) or more complex (like Asian languages, which do not use spaces, so the machine needs to recognize the words).
Once the query is broken down into smaller pieces, the search engine can correct misspellings and typos, apply synonyms, reduce the words further into roots, manage multiple languages, and more – all of which enable the user to type a more “natural” query.
On average, people type single-word or short-phrase queries to describe the items they are searching for. That is, they use keywords, not whole sentences or questions. (Though that’s changing, due to voice technology and the success of Google’s question and answer results.)
This kind of keyword search, both the simple and more advanced versions of it, has been around since the beginning of search. The more natural it is, the more advanced the techniques become. Search engines need to structure incoming queries before they can look up results in the search index. This pre-processing technology falls into what we call Natural Language Processing, or NLP, which is an umbrella term for any technology that enables computers to understand human language, whether written or spoken.
Natural language processing (“NLP”) takes text and transforms it into pieces that are easier for computers to use. Some common NLP tasks are removing stop words, segmenting words, or splitting compound words. NLP can also identify parts of speech, or important entities within text.
We’ve written quite a lot about natural language processing (NLP) here at Algolia. We’ve defined NLP, compared NLP vs NLU, and described some popular NLP/NLU applications. Additionally, our engineers have explained how our engine processes language and handles multilingual search. In this article, we’ll look at how NLP drives keyword search, which is an essential piece of our AI search solution that also includes AI/ML-based vector embeddings and hashing. To understand the nexus between keywords and NLP, it’s important to start off by diving deep into keyword search.
At its most basic, a keyword search engine compares the text of a query to the text of each record in a search index. Every record that matches (whether exact or similar) is returned by the search engine. Matching, as suggested, can be simple or advanced.
We use keywords to describe clothing, movies, toys, cars, and other objects. Most keyword search engines rely on structured data, where the objects in the index are clearly described with single words or simple phrases.
For example, a flower can be structured using tags, or “keys”, to form key-value pairs. The values (a large, red, summer, flower, with four petals) can be paired with their keys (size, color, season, type of object, and number of petals). The flower can also sell at a “price” of “4.99”.
We can represent this structure of keys and values as follows:
{
"name": "Meadow Beauty",
"size": "large",
"color":"red",
"season": “summer”,
"type of object": "flower",
"number of petals": "4",
"price": "4.99",
"description": "Coming from the Rhexia family,the Meadow Beauty is a wildflower.”
}
Search can feel both simple and complicated at the same time. Searching on Google is simple, and the results are pretty relevant (although that can be argued in recent years with Google pushing a lot of sponsored results). Although it is not a search solution, ChatGPT has made that experience even more magical. However, anyone who has implemented search knows that information retrieval is a highly complex topic.
We can measure retrieval quality using precision and recall. Precision is the percentage of retrieved documents that are relevant, and recall is the percentage of all relevant documents that are retrieved. Both metrics can help us to determine if search results are any good.

In fact, there can be a yin-yang between precision and recall; improving the precision (accuracy) may impact the recall, and improving the recall (completeness) may hurt precision.
The holy grail is improving both, and this is exactly what AI retrieval can do.
Retrieval was the last piece of the AI search puzzle, and it was also the hardest for several reasons:
In this chapter, I’ll explain how we’ve set out to solve this last challenge for AI retrieval. In future posts, I’ll speak more about the other parts of the search pyramid.
Search retrieval requires technologies to determine relevance for any particular query. For years, it was powered by keyword search engines. That’s changing. With the introduction of vector search, which goes beyond keyword search, concepts can be understood.
Vector search is a machine learning technology for AI search. Vectors are a way to represent words mathematically. Vectors are plotted and clustered in multiple dimensions (also called n-dimensional space). Vector search compares the similarity of multiple objects to a search query or subject item via their vector representation. In order to find similar matches, the query (or the subject) is converted into vectors using the same model that is used to convert objects (i.e. data or content) into vectors. Vectors that are similar to one another are returned from the database, finding the closest matches, providing accurate results, while eliminating irrelevant results that traditional search technology might have returned.
Techniques such as HNSW (Hierarchical Navigable Small World), IVF (Inverted File), or PQ (Product Quantization, a technique to reduce the number of dimensions of a vector) are some of the most popular Approximate Nearest Neighbor (ANN) methods to find similarity between vectors.
Each technique focuses on improving a particular performance property, such as memory reduction with PQ or fast but accurate search times with HNSW and IVF. It is common practice to mix several components to produce a ‘composite’ index to achieve optimal performance for a given use case.
There can be thousands of dimensions. The proximity and angle between each vector helps the search engine determine similarity between terms and meaning. Type in “espresso with milk thingy” and a vector search engine will look for similarity to return espresso makers with steam wands. However, type in one word, like “Delonghi” (an espresso maker brand), and a vector search engine is just as likely to return other brands and different machines — Nespresso, Keurig, KitchenAid, etc. That’s because vectors only understand the concept of a Delonghi.

Unlike newer vector engines, traditional keyword search engines are fast and precise. Thus, a search for a “Delonghi Magnifica” will give exactly that. However, keyword search engines can struggle when the query doesn’t match the content in your search index. To address the problem, companies can add rules, synonyms, keyword tagging, or other workarounds, but it’s impossible to cover every use case.
For example, you could write a rule that a query containing the keywords “coffee or espresso”, “machine”, and “milk” all mean the same thing as “espresso machine with steam wand.” The problem is that you won’t be able to cover every edge case for every possible long tail query. Take another example, a search for the word “java”. Java is used synonymously with coffee and espresso, but unless there’s a synonym or rule in the search engine, the query will fail.
Designing and building an efficient index of vectors that can scale is a complex and expensive task. Similarly, building an efficient keyword search engine that works for long tail queries is equally daunting. Separately, vector and keyword search technologies are quite good. However, together, they’re terrific.
This is called hybrid search, and it works well for exact matches, ambiguous short queries, and long tail queries.
So, you just slap together some vectors and keyword technologies and you’re done, right? Hybrid search engine solved! If only it were so easy. As you might imagine, there are a few challenges…
The biggest issue is managing vector search scaling and costs. Vectors are basically floating point numbers. Computers struggle immensely with floating point numbers. This is why specialized computers like GPUs are being used for AI and vector manipulation. Not only do you need specialized databases for running vector search, you’ll need full-time development resources to continually manage production.
With AI models, it is critical that the data that is fed to the models remains fresh, relevant, and optimized.
For ecommerce and enterprise businesses where data is constantly being updated and speed matters, AI search is too computationally expensive to run in production.
Some companies have attempted to go around the problem by running an AI query only if the keyword query fails. This helps to minimize processing costs, but fails to provide the best results for customers.
Locality-Sensitive Hashing (LSH) is a well-known algorithmic technique that hashes similar input items into the same “buckets” with high probability. Typically there are tradeoffs — higher or lower similarity — in how “buckets” are determined.
With our neural hashing technique, we have eliminated the need for tradeoffs. As a result, we can compress, or hash, vectors with neural networks (thus the name neural hashing) to 1/10th their normal size, while still retaining up to 99% of the information.
They can be stored and managed on standard hardware and databases. In fact, we can process hashed vectors, or binary vectors, up to 500 times faster than standard vector similarity, making it as fast to deliver as keyword search. And we can do it on regular CPUs.
Here’s an example of a long tail query for “non-teflon non-stick frypan” running on a keyword-only engine versus a hashing / keyword engine
There are several important takeaways to glean from the previous screenshots:
Note too that “frypan” is written as one word instead of two — it could have also been written “fry pan”, “frying pan”, or “skillet” — but the search engine doesn’t skip a beat. This touches on another important feature of AI-retrieval: it greatly reduces the manual workload associated with improving search relevance.
Gone are the days of writing synonym libraries for common terms, or writing rules for certain types of queries. It also opens up entirely new possibilities such as offering Q&A search.
Search is more than retrieval, of course. In our end-to-end AI search pyramid, retrieval (neural hashing) is in the middle. On each end of the AI pyramid is query understanding and ranking. In future blogs, I will touch on these other two capabilities.
Neural hashing represents a breakthrough for putting AI retrieval into production for a huge variety of use cases. Combined with AI-powered query processing and re-ranking, it promises to unleash the full power of AI on-site search. We’re excited to release these new end-to-end AI capabilities soon! Sign up today to be the first to try the all-new Algolia NeuralSearch platform when it’s available.
AI ranking refers to a variety of machine learning algorithms used to optimize the order of search results.
With any search query, there could be many relevant results. That’s where search result ranking comes into play.
Increasingly, ranking is powered by artificial intelligence (AI), which includes a variety of machine learning algorithms.
In this article, I will describe some of the different approaches to AI ranking and briefly discuss some of the challenges for delivering better results.
To improve search result ranking, you need to be able to measure it. Two simple measures of relevance are precision and recall.
These are useful metrics for helping us to determine if the results are any good. Ideally, recall and precision would both score 100%, but in practice that’s very difficult. Moreover, relevance can be subjective!
To illustrate, let’s say it’s time to buy a new TV, so you search your favorite seller’s site for “big tv.” You get 119 results. Amongst the results, some items are quite relevant. But some are not (for example, the TV stands and cabinet below). This is precision. There were many other relevant products on the site that were not included in the results. We call this recall.
To get better precision and recall, you might refine the search query. You might search “ TVs over 40” ”, and this time (see below) you get 262 results! It looks like there are more TV sets in the new results, but only some of them actually satisfy the query parameter of over 40 inches.
In fact, there can be a yin-yang between precision and recall; improving one may have a detrimental effect on the other.
There can be many relevant TVs. However, the best result is the one that customers are most interested in, which can be determined by clicks, purchases, ratings, lowest returns, etc. The top ranked results get the highest engagement; conversely, when people don’t easily find the best results, they’re likely to abandon your site or find another way to get their questions answered, such as opening a support ticket.
Precision and recall can help us conceptually begin to wrap our heads around AI ranking. When a query is performed, search engines must determine if something is relevant, and then rank them in an order from best to worst.
Traditionally, there were a variety of statistical methods used to rank results based on term frequency within a set of documents. Keyword search engines look for words of the query and their alternatives, and then the algorithm can be used to rank those results by most to least relevant. These methods are very efficient and fast, but require many additional heuristics. Some of the added capabilities to improve purely statistical ranking models include:
These “signals” can provide a nice feedback loop with results. As I’ll show, higher engagement means results can be re-ranked accordingly, which in turn can generate more engagement.