Product

Semantic Search: How It Works & Who It’s For
facebooklinkedintwittermail

Is semantic search applicable in your business and marketing plans, and how can you use it to your advantage?

For simple user queries, a search engine can reliably find the correct content using keyword matching alone. A “red toaster” query pulls up all of the products with toaster in the title or description, and red in the color attribute. Add synonyms like maroon for red and you can match even more toasters. But things start to become more difficult quickly: you have to add these synonyms yourself, and your search will also bring up toaster ovens. This is often where semantic search comes in.

Semantic search attempts to apply user intent and the meaning (or semantics) of words and phrases to find the right content. It goes beyond keyword matching by using information that might not be present immediately in the text (the keywords themselves) but are closely tied to what the searcher wants. For example, finding a sweater with the query sweater or even sweeter is no problem for keyword search, while the queries warm clothing or how can I keep my body warm in the winter? are better served by semantic search.

As you can imagine, attempting to go beyond the surface-level information embedded in the text is a complex endeavor, one attempted by many and incorporating many different components. Additionally, as with anything that shows great promise, semantic search is a term that is sometimes used for search that doesn’t truly live up to the name. In fact, the term can be so overused in marketing that it has even been used to describe very simple systems like the packaging of a synonym dictionary! So here we want to outline what semantic search is, the technological latest, what goes into it, and what it solves (and doesn’t).

What is semantic search?

Semantic search applies user intent, context, and conceptual meanings to match a user query to the corresponding content. It uses vector search and machine learning to return results that aim to match a user’s query, even when there are no word matches. To best find and rank results, semantic search will compare hundreds or even thousands of different characteristics that have been learned by machine learning and come back with a number that specifies how similar a record is to a query based on a formula that it has, again, derived through machine learning. It’s complex, yes, and powerful.

This is different from pure keyword search, and it can have a strong impact. A pure keyword search works roughly by matching query words to words in documents. Conversely, semantic search will often return results where there are no word matches, even with NLP applied, but the content still “plainly” matches what the user seeks.

Semantic search can do this because semantic search engines work differently. They don’t match on text, but on meaning. This might evoke thoughts of someone teaching the software each word’s definitions, and that is not far from what actually happens. Just like you would use context clues to understand a new word (“The shoe was stuck on my foot, but I reckerdly pull on it until it came off.”), a semantic search engine does the same, but across millions of examples. By looking at this massive dataset, it doesn’t come up with a formal definition of words, but it does understand what the words mean based on its context or usage, and what other words can be used in the same or similar contexts.

Context

The context in which a search happens is important. Context can be as simple as the locale (an American searching for football wants something different compared to a Brit searching the same thing) or much more complex.

An intelligent search engine will use the context on both a personal level and a group level. The personal level influencing of results is called, appropriately enough, personalization and will use that individual searcher’s affinities, previous searches, and previous interactions to return the content that is best suited to the current query. Personalization is applicable to all kinds of searching, but semantic search can go even farther. On a group level, a search engine can rerank results using information about how all searchers interact with search results, such as which results are clicked on most often, or even seasonality of when certain results are more popular than others. Again, this displays how semantic search can bring in intelligence to search, in this case intelligence via user behavior.

Semantic search can also leverage the context within text. We’ve already discussed that synonyms are useful in all kinds of search, and can improve keyword search by expanding the matches for queries to related content. But we know as well that synonyms are not universal–sometimes two words are equivalent in one context, and not in another. When someone searches for football players, what are the right results? The answer will be different in Kent, Ohio than in Kent, United Kingdom. A query like tampa bay football players, however, probably doesn’t need to know where the searcher is located. Adding a blanket synonym that made football and soccer equivalent would have led to a poor experience when that searcher saw the Tampa Bay Rowdies soccer club next to Tom Brady. (Of course, if we know that the searcher would have preferred to see the Tampa Bay Rowdies, the search engine can take that into account!) This is an example of query understanding via semantic search.

User intent

The ultimate goal of any search engine is to help the user be successful in completing a task, whether it’s to read news articles, buy clothing, or find a document. The search engine needs to figure out what the user wants to do, or what the user intent is.

We can see this when searching on an eCommerce website. As the user types the query jordans, the search automatically filters on the category Shoes. This anticipates that the user intent is to find shoes, and not jordan almonds (which would be in the Food & Snacks category). By getting ahead of the user intent, the search engine can return the most relevant results, and not distract the user with items that match textually, but not relevantly. This can be all the more relevant when applying a sort on top of the search, like price from lowest to highest. This is an example of query categorization. Categorizing the query and limiting the results set will ensure that only relevant results appear.

Difference between keyword and semantic search

We have already seen ways in which semantic search is intelligent, but it’s worth looking more at how it is different from keyword search. While keyword search engines also bring in natural language processing to improve this word-to-word matching–through methods such as using synonyms, removing stop words, ignoring plurals–that processing still relies on matching words to words. But semantic search can return results where there is not matching text, but anyone with knowledge of the domain can see that there are plainly good matches.

This ties into the big difference between keyword search and semantic search, which is how matching between query and records occurs. To simplify things some, keyword search occurs by matching on text. Soap will always match soap or soapy, because of the overlap in textual quality. More specifically, there are enough matching letters (or characters) to tell the engine that a user searching for one will want the other. That same matching will also tell the engine that the query soap is a more likely match for the word soup than the word detergent, unless the owner of the search engine has told the engine ahead of time that soap and detergent are equivalent, in which case the search engine will “pretend” that detergent is actually soap when it is determining similarity. 

Keyword-based search engines can also use tools like synonyms, alternatives, or query word removal–all types of query expansion and relaxation–to help with this information retrieval task, as well as NLP and NLU tools like typo tolerance, tokenization, and normalization.

Because semantic search is matching on concepts, the search engine can no longer determine whether records are relevant based on how many characters two words share. Again, think about soap versus soup versus detergent. Or more complex queries, like laundry cleaner, remove stains clothing, or how do I get grass stains out of denim? You can even include things like image searching! 

A real-world analogy of this would be a customer asking an employee where a “toilet unclogger” is located. An employee with only a pure keyword-esque understanding of the request would fail it unless the store explicitly refers to their plungers, drain cleaners and toilet augers as “toilet uncloggers.” But, we would hope, the employee is wise enough to make the connection between the various terms and direct the customer to the right aisle (perhaps by knowing the different terms, or synonyms, a customer can use for any given product).

A succinct way of summarizing what semantic search does is to say that semantic search brings increased intelligence to match on concepts more than words, through the use of vector search. With this intelligence, semantic search can perform in a more human-like manner, like a searcher finding dresses and suits when searching fancy, with not a jean in sight.

What is semantic search not?

By now, semantic search should be clear as a powerful method for improving search quality. As such, you should not be surprised to learn that the meaning of semantic search has been applied more and more broadly, to search experiences that don’t always warrant the name. And while there is no official definition of semantic search, we can say that it is search that goes beyond traditional keyword-based search by incorporating real-world knowledge to derive user intent based on the meaning of queries and content.

This leads to the conclusion that semantic search is not simply about applying NLP and adding synonyms to an index. While tokenization does require some real-world knowledge about language construction, and synonyms apply understanding of conceptual matches, they lack, in most cases, an artificial intelligence that is required for search to rise to the level of semantic

Implementation of semantic search

It is this last bit that makes semantic search both powerful and difficult. Generally, with the term semantic search, there is an implicit understanding that there is some level of machine learning involved. Almost as often, this also involves vector search.

Vector search works by encoding details about an item into vectors and then comparing vectors to determine which are most similar. Again, even a simple example can help.

Take two phrases: “Toyota Prius” and “steak”. And now let’s compare those to “hybrid”. Which of the first two are more similar? Neither would match textually, but you probably would say that “Toyota Prius” is the more similar of the two. You can say this because you know that a Prius is a type of hybrid vehicle, because you have seen “Toyota Prius” in a similar context as the word hybrid, such as “Toyota Prius is a hybrid worth considering,” or “hybrid vehicles like the Toyota Prius.” You’re pretty sure, however, you’ve never seen “steak” and “hybrid” in such close quarters.

This is generally how vector search works as well. A machine learning model takes thousands or millions of examples from the web, books, or other sources and uses this information to then make predictions on similarity. Of course, it is not feasible for the model to go through comparisons one-by-one (“Are Toyota Prius and hybrid seen together often? How about hybrid and steak?”) and so what happens instead is that the models will encode “patterns” that it notices about the different phrases. It’s similar to how you might look at a phrase and say, “this one is positive” or “that one includes a color,” except the language model doesn’t work so transparently (which is also why language models can be difficult to debug). These encodings are stored in a vector, or a long list of numeric values, and vector search uses math to calculate how similar different vectors are. 

Another way to think about the similarity measurements that vector search does, is to imagine the vectors plotted out. This is mind-blowingly difficult if you try to think of a vector plotted into hundreds of dimensions, but if you instead imagine a vector plotted into three dimensions, the principle is the same. These vectors form a line when plotted, and the question is: which of these lines are closest to each other? The lines for steak and beef will be closer than the lines for steak and car, and so are more similar. This principle is called vector similarity.

Vector similarity has a lot of applications. It can make recommendations based on the previously purchased products, find the most similar image, and can determine which items best match semantically when compared to a user’s query.

Conclusion

Semantic search is a powerful tool for search applications that has come more to the forefront with the rise of powerful deep learning models and the hardware to support them. While we’ve touched on a number of different common applications here, there are even more that use vector search and AI. Even image search or extracting metadata from images can be classified as semantic search. We’re in exciting times!

And, yet, its application is still early and its known powerfulness can lend itself to a misappropriation of the term. There are many components in a semantic search pipeline, and getting each one correct is important. When done correctly, semantic search will use real-world knowledge, especially through machine learning and vector similarity, to use user intent, context, and conceptual meanings to match a user query to the corresponding content.

About the authorDustin Coates

Dustin Coates

Product and GTM Manager

Recommended Articles

Powered by Algolia AI Recommendations

The definitive guide to semantic search engines
AI

The definitive guide to semantic search engines

Jon Silvers

Jon Silvers

Director, Digital Marketing
The past, present, and future of semantic search
AI

The past, present, and future of semantic search

Julien Lemoine

Julien Lemoine

Co-founder & former CTO at Algolia
Semantic search: the next big thing in search engine technology
Product

Semantic search: the next big thing in search engine technology

Vincent Caruana

Vincent Caruana

Senior Digital Marketing Manager, SEO