Search by Algolia
How personalization boosts customer engagement
e-commerce

How personalization boosts customer engagement

You land on your favorite retailer’s website, where everything seems to be attractively arranged just for you. Your favorite ...

Jon Silvers

Director, Digital Marketing

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?
e-commerce

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?

There is such tremendous activity both on and off of retailer websites today that it would be impossible to make ...

Catherine Dee

Search and Discovery writer

8 ways to use merchandising data to boost your online store ROI
e-commerce

8 ways to use merchandising data to boost your online store ROI

New year, new goals. Sounds positive, but looking at your sales data, your revenue and profit aren’t so hot ...

John Stewart

VP, Corporate Communications and Brand

Algolia DocSearch + Astro Starlight
engineering

Algolia DocSearch + Astro Starlight

What is Astro Starlight? If you're building a documentation site, your content needs to be easy to write and ...

Jaden Baptista

Technical Writer

What role does AI play in recommendation systems and engines?
ai

What role does AI play in recommendation systems and engines?

You put that in your cart. How about this cool thing to go with it? You liked that? Here are ...

Catherine Dee

Search and Discovery writer

How AI can help improve your user experience
ux

How AI can help improve your user experience

They say you get one chance to make a great first impression. With visual design on ecommerce web pages, this ...

Jon Silvers

Director, Digital Marketing

Keeping your Algolia search index up to date
product

Keeping your Algolia search index up to date

When creating your initial Algolia index, you may seed the index with an initial set of data. This is convenient ...

Jaden Baptista

Technical Writer

Merchandising in the AI era
e-commerce

Merchandising in the AI era

For merchandisers, every website visit is an opportunity to promote products to potential buyers. In the era of AI, incorporating ...

Tariq Khan

Director of Content Marketing

Debunking the most common AI myths
ai

Debunking the most common AI myths

ARTIFICIAL INTELLIGENCE CAN’T BE TRUSTED, shouts the headline on your social media newsfeed. Is that really true, or is ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How AI can benefit the retail industry
ai

How AI can benefit the retail industry

Artificial intelligence is on a roll. It’s strengthening healthcare diagnostics, taking on office grunt work, helping banks combat fraud ...

Catherine Dee

Search and Discovery writer

How ecommerce AI is reshaping business
e-commerce

How ecommerce AI is reshaping business

Like other modern phenomena such as social media, artificial intelligence has landed on the ecommerce industry scene with a giant ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

AI-driven smart merchandising: what it is and why your ecommerce store needs it
ai

AI-driven smart merchandising: what it is and why your ecommerce store needs it

Do you dream of having your own personal online shopper? Someone familiar and fun who pops up every time you ...

Catherine Dee

Search and Discovery writer

NRF 2024: A cocktail of inspiration and innovation
e-commerce

NRF 2024: A cocktail of inspiration and innovation

Retail’s big show, NRF 2024, once again brought together a wide spectrum of practitioners focused on innovation and transformation ...

Reshma Iyer

Director of Product Marketing, Ecommerce

How AI-powered personalization is transforming the user and customer experience
ai

How AI-powered personalization is transforming the user and customer experience

In a world of so many overwhelming choices for consumers, how can you best engage with the shoppers who visit ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show
algolia

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show

Get ready for an exhilarating journey into the future of retail as Algolia takes center stage at the NRF Retail ...

John Stewart

VP Corporate Marketing

How to master personalization with AI
ai

How to master personalization with AI

Picture ecommerce in its early days: businesses were just beginning to discover the power of personalized marketing. They’d divide ...

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

5 best practices for nailing the ecommerce virtual assistant user experience
ai

5 best practices for nailing the ecommerce virtual assistant user experience

“Hello there, how can I help you today?”, asks the virtual shopping assistant in the lower right-hand corner ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Time is our most precious nonrenewable resource. If you’re like most people, you probably find it deeply annoying to waste your time and energy having to read irrelevant ads that are unhelpful and sometimes downright misleading.

I remember turning 25 and being instantly subjected to ads for wedding services. As soon as I turned 30, they became ads for diapers. While this simplistic age-based targeting strategy sometimes works, it can also be stereotypical and limiting.

Increasingly, simplistic ad targeting also makes for an annoying user experience. It’s why people start using ad blockers, and it’s a surefire way for a company to throw marketing dollars out the (browser) window.

There’s a better way to optimize your marketing budget than throwing spaghetti at the wall and hoping something sticks. We know this because our customers use the method I’m about to describe to achieve an increase of up to 15% on their click-through rates (CTR) from ad campaigns.

It’s true that evolving technology has helped make targeting easier and, honestly, creepier (we’re looking at you, Facebook!). But we know for a fact that there’s a more effective way to get users to click and get what they need, and it has to do with AI.

Let’s explore how user search intent works to boost ad campaign performance.

personalization AI banner

Why user search intent is all the rage

SEO experts and marketers used to optimize campaigns and content for different types of devices. Remember when you had to use a desktop PC to get on the Internet? Yup, simpler times.

As analytics became more powerful and the technology behind them improved, the criteria used to segment and target audiences became more varied. Then, as more people went online, location became essential for marketing and sales performance. Then came social media: companies  started amassing volumes of specific data about people, which led to even deeper, more thorough targeting.

So why is it that in spite of all this data and technology, millions still go to waste?

Technology is not the answer unless you really care about the customer.

Enter user search intent. As the phrase suggests, user search intent reveals what someone wants when they Google something. (That term applies to all search engines, but who are we kidding? There’s no way you’re going to Bing something.)

User search intent has become the sweetheart of the optimization world — whether for SEO, conversion rate optimization (CRO), or other disciplines — because it provides very specific insight without being stalkerish. The user provides the intent, and all companies have to do is pay attention and correctly interpret it.

And that’s where things get tricky. Google knows this better than anyone. That’s why it changes its search algorithm 500–600 times (!) per year.

By processing trillions of searches each year, Google’s algorithm has come to understand the intent behind each query. Its frequent updates are changing the face of search engine results pages (SERP) to accommodate searcher intent. 

For example, when you search for “harry potter,” the algorithm knows you may want to find out about books and movies, so it combines the two likely intents and makes relevant results readily available:

Even though your company is not Google, you can leverage user search intent by using AI., You can then feed the resulting knowledge into your marketing machine, and do it at scale. Here is how. 

How leveraging user search intent helps your business

When you know why customers want something, you can understand how to deliver it.

If you know what someone is looking for, you can deliver the right information at the right time to help them find what they need, as well as:

  • Reduce noise
  • Reduce your customer acquisition cost (CAC)
  • Boost your CTR
  • Increase customer retention
  • Strengthen your positioning
  • Power the referral flywheel
  • Optimize the resources you invest in marketing

So how do you know what consumers want?

We accepted the challenge of finding that out for a customer in the advertising technology space. They were dealing with about 5 billion search queries spread across 17 languages. For this project, we focused on English.

Nailing user search intent the old-fashioned way

Unlocking your growth potential with user search intent starts with building a keyword list. The more keywords you can collect, the better (think five to six figures). Just make sure quantity doesn’t get in the way of quality.

The next step entails data mining so that the keywords can be categorized and selected for the subsequent stage. No matter how much you try to automate this process, you’ll still need to manually review your keyword categories to ensure that they are relevant for your objectives.

The same workflow applies to identifying intent in keywords, both in terms of triggers that indicate intent and the intent type attached to each keyword (informational, transactional, navigation, or consideration; we’ll talk about these below). 

When you’ve done all this work — which can take hours if you’ve never experimented with it — it’s time to use a dedicated tool to clean up the data and establish relationships between the data sources.

You’ll then build a dashboard so you can uncover effective insights for the marketing team to use in creating campaigns and optimizing budgets. 

The shortcomings of manually identifying intents for your keyword list are obvious:

  • Time and resource intensive
  • Difficult to scale effectively
  • Requires specific know-how that your marketing team may lack
  • Doesn’t provide insights via an API that can be fed into a 3rd party platform to automate future optimization

For these reasons, we focused on finding a more scalable solution for identifying user search intent, which is becoming pervasive in large companies as their marketing departments continue to refine both their approaches and tactics.

Extracting meaning with machine-learning models

Our challenge was to predict user search intent based on queries. A user’s search intent is naturally indicated by their search-engine activity (the queries they use and the links they click) and activity on target websites, but we didn’t have access to those types of data. 

Our goal was to extract insights that were as rich and actionable as possible from the queries. The question we sought to answer was how can a machine-learning model understand the intent behind a query?

For a human being, the answer is obvious: just read the words, which convey meaning. But in an ML model, a word is just a group of characters without meaning attached. 

When a romantic partner asks “What’s wrong, babe?” and gets “Nothing” in response, most people know that’s not what the person means. An ML model has no way of knowing, though. In addition, user intent can be ambiguous even for humans. “Nothing” can mean any number of things.

Here’s another example: if someone searches for “iPhone 8,” what do they want to do? Are they looking for product specifications or reviews? Do they want to see photos of the phone? We don’t know for sure, but we can make an assumption based on several intent categories.

How humans define intent vs. how AI does it

For this use case, we split user intent into three categories, with a fourth (consideration) added  later. These types of intent correspond to the layers in the marketing funnel:

Informational or Awareness

Related to finding information about a topic. Examples: 

“New York city population 2013”

“how tall is the Eiffel Tower”

Transactional

Related to accomplishing a goal or engaging in an activity. Examples:

“buy Avengers DVD”

“iPhone price”

Navigational

Also called “Visit in person.” Related to finding a nearby place or other types of local information. Examples: 

“Chinese restaurant nearby”

“bus schedule”

Consideration

These are in between informational and transactional intent. Examples:

“iPhone reviews”

“Samsung iPhone comparison”

 

What’s challenging for AI is the ambiguity of queries. Inherently, some of them have multiple intents. For example, if someone searches for “hotels,” the intent depends on the context. It can be either navigational (finding a nearby hotel) or consideration (making an online reservation). It could also be transactional, although this generic search term suggests that the user may not be ready to make a reservation.

Turning words into numbers that can be fed  into the ML model

A model can’t make sense of words. If we tried to use the raw data, it would be like teaching a dog to obey commands by showing it pictures of other dogs. Complete gibberish. Models operate on algorithms, so in order to “speak their language,” we have to transform the queries into their mathematical equivalents. This problem falls in the natural language processing (NLP) category.

The transformation from words into an understandable format can be achieved with the help of ML models such as GloVe or FastText. These tools convert each word into a set of numbers (a vector) while simultaneously maintaining the relationship between the words. This means two words that are related (such as “buy” and “shopping”) will be seen as more closely related than two unrelated words (such as “buy” and “parrot”).

Manual vs. auto-labeling intent — how to scale it

So you have word representations and intent categories. Now what?

The next step is to annotate the queries with the intent for each of them and train the model using this data set. Consequently, a trained model will learn to predict user intent for new queries. This is a concept similar to that of object recognition models. Provided with  enough photos with cats in them, a model will learn to recognize a cat in a photo that it couldn’t see before.

Going back to annotating queries, there are two options:

  • Manually labeling the data

This is a fancy way of saying “Look at the query and write the intent next to it.”

The main advantage of manually annotating data is high-quality results, since humans do the work. However, that is a very slow process (it takes approximately two hours to annotate 1,000 queries), so you won’t get far on a data set containing a few million queries.

However, the small resulting data set can be used for validation, meaning you can compare it with the results generated by the model and see how different they are from each other.

  • Automatically labeling the data

The automatic labeling process entails creating a script that uses several rules for attaching intent categories to each query. A naive approach goes like this: assuming that the word “buy” indicates transactional intent, a script annotates all queries containing this word as transactional. This method is precise, but it limits the number of labeled queries because not all transactional queries will include this word.

For the more evolved approach we created, we used word representations (vectors) described beforehand, and we calculated the distance between words. If a query included “shopping,” a word closely related to “buy,” the script labeled the query as transactional.

The main advantage of this approach is that it can process large volumes of data, although it does have limitations: it doesn’t recognize words that contain typos or that are not in a dictionary (for example, specific smartphone models).

Training the model to recognize search intent

We now had a dataset with labeled data. The next step was choosing the model type, training the model, evaluating the results, and iterating. Because the data was labeled, we had a problem of multi-class classification. Here are the results for a random list of queries:

Note that intent is expressed as a probability using a value between 0 and 1. For example, the query “cats for sale near me” expresses both a transactional and a navigational intent. We can determine the most probable intent by looking at the highest prediction value. Working on this machine-learning problem was highly iterative.

In programming, there are always multiple ways to achieve a goal (such as by implementing a feature), but if we follow the steps, we are guaranteed to achieve a result.

Machine learning is different in the sense that it involves a great deal of trial and error. We don’t have a clear path to the solution, so we may try many approaches and measure results for each of them. After going through this process, we can then keep the best solution and discard the rest. Ignoring sunk costs is our secret weapon.

 

Machine learning models

When working with utterances or queries, as is the case here,  it is very natural to structure the model in a recurrent way, rather than treating them independently. This is because of the temporal order of the words. Thus, a linear recurrent neural network model (RNN-1) was the first thing we used to model the intent prediction problem. 

Then, we increased the complexity of the model, in order to see if adding non-linearities (RNN-2), more complicated recurrent layers, or stacking multiple recurrences (RNN-3) would help solving the problem. In the end, we have 3 recurrent models, each increasing in complexity compared to the previous one. 

We also built a convolutional neural network with no recursion (CNN-1), where each word in the query is independent from the previous ones, using max-pooling over time. Conceptually, each word is filtered using a convolutional layer; then for each index in the feature of the words, we compute the maximum value, in the end resulting in a single feature over the entire query. 

The input for the neural networks is a word embedding (GloVe, Fasttext, One-hot), producing a probabilistic output for each of the three intents, from a softmax activation function. Then, during test time, the chosen intent is the one with the highest probability (single-intent prediction), or the probabilities themselves (multi-intent prediction).

We have constructed multiple model versions, 8 in total, starting from a very basic set of words, and then improving each method iteratively, based both on results and common sense. We also used an external annotation tool (Ext-1). Below we provide a compiled table which highlights the similarities and differences between each of them. 

In our findings, we observed that while varying the model architecture brings a very small improvement, using a richer word representation improves the results by a few percent every time. Looking at GloVe-100 vs GloVe-300, we see a constant improvement. Using Fasttext or GloVe doesn’t influence the final result: both yielded competitive results.

However, perhaps the most surprising result was that using non-pretrained embedding (one-hot) results in a very big improvement compared to all the other representations. Only the recurrent models could be trained using this representation, while the convolutional model diverges during training. This may be because of the large amount of parameters required, bad initialization, or a lack of proper hyperparameter tuning. 

Either way, both RNN-1 and RNN-2 trained with one-hot encoding on the 1M dictionary yield very good results: 75.61% for multi-intent (2 agreements) and 79.01% for single-intent (3 agreements). 

 

Do it yourself

Before building your own user search intent model, keep in mind that the automatic-labeling method produces the best results when all search queries belong to the same category (e.g., fashion, retail, auto, or lifestyle). No matter how good the model becomes, user search intent still retains a level of ambiguity because not even humans can agree on the same way to label a specific data set.

Our customers and partners have been using our user search intent prediction model in different scenarios with exciting results: an increase of up to 15% on their CTR! We’re now at about 85% accuracy for English data sets and we’re working to further improve this performance.

Let us know if you’re using a similar approach to identify user search intent predictions! If you bump into any issues, we’d love to hear from your experience on building NLP algorithms to produce more-accurate results.

About the author
Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

githublinkedintwitter

Curious about what Algolia can do for your business?

We'd love to give you a personal tour of our search and discovery experiences.

Get in touch
Curious about what Algolia can do for your business?

Recommended Articles

Powered byAlgolia Algolia Recommend

What is meant by “search intent” and what are the different types?
ux

Catherine Dee

Search and Discovery writer

AI and intent data: how do they contribute to full-blown intent intelligence?
ai

Adam Smith

Sr. Director, Digital Marketing

What is a search query and how is it processed by a search engine?
product

Catherine Dee

Search and Discovery writer