Time is our most precious nonrenewable resource. If you’re like most people, you probably find it deeply annoying to waste your time and energy having to read irrelevant ads that are unhelpful and sometimes downright misleading.
I remember turning 25 and being instantly subjected to ads for wedding services. As soon as I turned 30, they became ads for diapers. While this simplistic age-based targeting strategy sometimes works, it can also be stereotypical and limiting.
Increasingly, simplistic ad targeting also makes for an annoying user experience. It’s why people start using ad blockers, and it’s a surefire way for a company to throw marketing dollars out the (browser) window.
There’s a better way to optimize your marketing budget than throwing spaghetti at the wall and hoping something sticks. We know this because our customers use the method I’m about to describe to achieve an increase of up to 15% on their click-through rates (CTR) from ad campaigns.
It’s true that evolving technology has helped make targeting easier and, honestly, creepier (we’re looking at you, Facebook!). But we know for a fact that there’s a more effective way to get users to click and get what they need, and it has to do with AI.
Let’s explore how user search intent works to boost ad campaign performance.
SEO experts and marketers used to optimize campaigns and content for different types of devices. Remember when you had to use a desktop PC to get on the Internet? Yup, simpler times.
As analytics became more powerful and the technology behind them improved, the criteria used to segment and target audiences became more varied. Then, as more people went online, location became essential for marketing and sales performance. Then came social media: companies started amassing volumes of specific data about people, which led to even deeper, more thorough targeting.
So why is it that in spite of all this data and technology, millions still go to waste?
Technology is not the answer unless you really care about the customer.
Enter user search intent. As the phrase suggests, user search intent reveals what someone wants when they Google something. (That term applies to all search engines, but who are we kidding? There’s no way you’re going to Bing something.)
User search intent has become the sweetheart of the optimization world — whether for SEO, conversion rate optimization (CRO), or other disciplines — because it provides very specific insight without being stalkerish. The user provides the intent, and all companies have to do is pay attention and correctly interpret it.
And that’s where things get tricky. Google knows this better than anyone. That’s why it changes its search algorithm 500–600 times (!) per year.
By processing trillions of searches each year, Google’s algorithm has come to understand the intent behind each query. Its frequent updates are changing the face of search engine results pages (SERP) to accommodate searcher intent.
For example, when you search for “harry potter,” the algorithm knows you may want to find out about books and movies, so it combines the two likely intents and makes relevant results readily available:
Even though your company is not Google, you can leverage user search intent by using AI., You can then feed the resulting knowledge into your marketing machine, and do it at scale. Here is how.
When you know why customers want something, you can understand how to deliver it.
If you know what someone is looking for, you can deliver the right information at the right time to help them find what they need, as well as:
So how do you know what consumers want?
We accepted the challenge of finding that out for a customer in the advertising technology space. They were dealing with about 5 billion search queries spread across 17 languages. For this project, we focused on English.
Unlocking your growth potential with user search intent starts with building a keyword list. The more keywords you can collect, the better (think five to six figures). Just make sure quantity doesn’t get in the way of quality.
The next step entails data mining so that the keywords can be categorized and selected for the subsequent stage. No matter how much you try to automate this process, you’ll still need to manually review your keyword categories to ensure that they are relevant for your objectives.
The same workflow applies to identifying intent in keywords, both in terms of triggers that indicate intent and the intent type attached to each keyword (informational, transactional, navigation, or consideration; we’ll talk about these below).
When you’ve done all this work — which can take hours if you’ve never experimented with it — it’s time to use a dedicated tool to clean up the data and establish relationships between the data sources.
You’ll then build a dashboard so you can uncover effective insights for the marketing team to use in creating campaigns and optimizing budgets.
The shortcomings of manually identifying intents for your keyword list are obvious:
For these reasons, we focused on finding a more scalable solution for identifying user search intent, which is becoming pervasive in large companies as their marketing departments continue to refine both their approaches and tactics.
Our challenge was to predict user search intent based on queries. A user’s search intent is naturally indicated by their search-engine activity (the queries they use and the links they click) and activity on target websites, but we didn’t have access to those types of data.
Our goal was to extract insights that were as rich and actionable as possible from the queries. The question we sought to answer was how can a machine-learning model understand the intent behind a query?
For a human being, the answer is obvious: just read the words, which convey meaning. But in an ML model, a word is just a group of characters without meaning attached.
When a romantic partner asks “What’s wrong, babe?” and gets “Nothing” in response, most people know that’s not what the person means. An ML model has no way of knowing, though. In addition, user intent can be ambiguous even for humans. “Nothing” can mean any number of things.
Here’s another example: if someone searches for “iPhone 8,” what do they want to do? Are they looking for product specifications or reviews? Do they want to see photos of the phone? We don’t know for sure, but we can make an assumption based on several intent categories.
For this use case, we split user intent into three categories, with a fourth (consideration) added later. These types of intent correspond to the layers in the marketing funnel:
Informational or Awareness
Related to finding information about a topic. Examples: “New York city population 2013” “how tall is the Eiffel Tower” |
Transactional
Related to accomplishing a goal or engaging in an activity. Examples: “buy Avengers DVD” “iPhone price” |
Navigational
Also called “Visit in person.” Related to finding a nearby place or other types of local information. Examples: “Chinese restaurant nearby” “bus schedule” |
Consideration
These are in between informational and transactional intent. Examples: “iPhone reviews” “Samsung iPhone comparison” |
What’s challenging for AI is the ambiguity of queries. Inherently, some of them have multiple intents. For example, if someone searches for “hotels,” the intent depends on the context. It can be either navigational (finding a nearby hotel) or consideration (making an online reservation). It could also be transactional, although this generic search term suggests that the user may not be ready to make a reservation.
A model can’t make sense of words. If we tried to use the raw data, it would be like teaching a dog to obey commands by showing it pictures of other dogs. Complete gibberish. Models operate on algorithms, so in order to “speak their language,” we have to transform the queries into their mathematical equivalents. This problem falls in the natural language processing (NLP) category.
The transformation from words into an understandable format can be achieved with the help of ML models such as GloVe or FastText. These tools convert each word into a set of numbers (a vector) while simultaneously maintaining the relationship between the words. This means two words that are related (such as “buy” and “shopping”) will be seen as more closely related than two unrelated words (such as “buy” and “parrot”).
So you have word representations and intent categories. Now what?
The next step is to annotate the queries with the intent for each of them and train the model using this data set. Consequently, a trained model will learn to predict user intent for new queries. This is a concept similar to that of object recognition models. Provided with enough photos with cats in them, a model will learn to recognize a cat in a photo that it couldn’t see before.
Going back to annotating queries, there are two options:
This is a fancy way of saying “Look at the query and write the intent next to it.”
The main advantage of manually annotating data is high-quality results, since humans do the work. However, that is a very slow process (it takes approximately two hours to annotate 1,000 queries), so you won’t get far on a data set containing a few million queries.
However, the small resulting data set can be used for validation, meaning you can compare it with the results generated by the model and see how different they are from each other.
The automatic labeling process entails creating a script that uses several rules for attaching intent categories to each query. A naive approach goes like this: assuming that the word “buy” indicates transactional intent, a script annotates all queries containing this word as transactional. This method is precise, but it limits the number of labeled queries because not all transactional queries will include this word.
For the more evolved approach we created, we used word representations (vectors) described beforehand, and we calculated the distance between words. If a query included “shopping,” a word closely related to “buy,” the script labeled the query as transactional.
The main advantage of this approach is that it can process large volumes of data, although it does have limitations: it doesn’t recognize words that contain typos or that are not in a dictionary (for example, specific smartphone models).
We now had a dataset with labeled data. The next step was choosing the model type, training the model, evaluating the results, and iterating. Because the data was labeled, we had a problem of multi-class classification. Here are the results for a random list of queries:
Note that intent is expressed as a probability using a value between 0 and 1. For example, the query “cats for sale near me” expresses both a transactional and a navigational intent. We can determine the most probable intent by looking at the highest prediction value. Working on this machine-learning problem was highly iterative.
In programming, there are always multiple ways to achieve a goal (such as by implementing a feature), but if we follow the steps, we are guaranteed to achieve a result.
Machine learning is different in the sense that it involves a great deal of trial and error. We don’t have a clear path to the solution, so we may try many approaches and measure results for each of them. After going through this process, we can then keep the best solution and discard the rest. Ignoring sunk costs is our secret weapon.
When working with utterances or queries, as is the case here, it is very natural to structure the model in a recurrent way, rather than treating them independently. This is because of the temporal order of the words. Thus, a linear recurrent neural network model (RNN-1) was the first thing we used to model the intent prediction problem.
Then, we increased the complexity of the model, in order to see if adding non-linearities (RNN-2), more complicated recurrent layers, or stacking multiple recurrences (RNN-3) would help solving the problem. In the end, we have 3 recurrent models, each increasing in complexity compared to the previous one.
We also built a convolutional neural network with no recursion (CNN-1), where each word in the query is independent from the previous ones, using max-pooling over time. Conceptually, each word is filtered using a convolutional layer; then for each index in the feature of the words, we compute the maximum value, in the end resulting in a single feature over the entire query.
The input for the neural networks is a word embedding (GloVe, Fasttext, One-hot), producing a probabilistic output for each of the three intents, from a softmax activation function. Then, during test time, the chosen intent is the one with the highest probability (single-intent prediction), or the probabilities themselves (multi-intent prediction).
We have constructed multiple model versions, 8 in total, starting from a very basic set of words, and then improving each method iteratively, based both on results and common sense. We also used an external annotation tool (Ext-1). Below we provide a compiled table which highlights the similarities and differences between each of them.
In our findings, we observed that while varying the model architecture brings a very small improvement, using a richer word representation improves the results by a few percent every time. Looking at GloVe-100 vs GloVe-300, we see a constant improvement. Using Fasttext or GloVe doesn’t influence the final result: both yielded competitive results.
However, perhaps the most surprising result was that using non-pretrained embedding (one-hot) results in a very big improvement compared to all the other representations. Only the recurrent models could be trained using this representation, while the convolutional model diverges during training. This may be because of the large amount of parameters required, bad initialization, or a lack of proper hyperparameter tuning.
Either way, both RNN-1 and RNN-2 trained with one-hot encoding on the 1M dictionary yield very good results: 75.61% for multi-intent (2 agreements) and 79.01% for single-intent (3 agreements).
Before building your own user search intent model, keep in mind that the automatic-labeling method produces the best results when all search queries belong to the same category (e.g., fashion, retail, auto, or lifestyle). No matter how good the model becomes, user search intent still retains a level of ambiguity because not even humans can agree on the same way to label a specific data set.
Our customers and partners have been using our user search intent prediction model in different scenarios with exciting results: an increase of up to 15% on their CTR! We’re now at about 85% accuracy for English data sets and we’re working to further improve this performance.
Let us know if you’re using a similar approach to identify user search intent predictions! If you bump into any issues, we’d love to hear from your experience on building NLP algorithms to produce more-accurate results.
Ciprian Borodescu
AI Product Manager | On a mission to help people succeed through the use of AIPowered by Algolia AI Recommendations
Catherine Dee
Search and Discovery writerAdam Smith
Sr. Director, Digital MarketingCatherine Dee
Search and Discovery writer