Search by Algolia
How personalization boosts customer engagement
e-commerce

How personalization boosts customer engagement

You land on your favorite retailer’s website, where everything seems to be attractively arranged just for you. Your favorite ...

Jon Silvers

Director, Digital Marketing

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?
e-commerce

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?

There is such tremendous activity both on and off of retailer websites today that it would be impossible to make ...

Catherine Dee

Search and Discovery writer

8 ways to use merchandising data to boost your online store ROI
e-commerce

8 ways to use merchandising data to boost your online store ROI

New year, new goals. Sounds positive, but looking at your sales data, your revenue and profit aren’t so hot ...

John Stewart

VP, Corporate Communications and Brand

Algolia DocSearch + Astro Starlight
engineering

Algolia DocSearch + Astro Starlight

What is Astro Starlight? If you're building a documentation site, your content needs to be easy to write and ...

Jaden Baptista

Technical Writer

What role does AI play in recommendation systems and engines?
ai

What role does AI play in recommendation systems and engines?

You put that in your cart. How about this cool thing to go with it? You liked that? Here are ...

Catherine Dee

Search and Discovery writer

How AI can help improve your user experience
ux

How AI can help improve your user experience

They say you get one chance to make a great first impression. With visual design on ecommerce web pages, this ...

Jon Silvers

Director, Digital Marketing

Keeping your Algolia search index up to date
product

Keeping your Algolia search index up to date

When creating your initial Algolia index, you may seed the index with an initial set of data. This is convenient ...

Jaden Baptista

Technical Writer

Merchandising in the AI era
e-commerce

Merchandising in the AI era

For merchandisers, every website visit is an opportunity to promote products to potential buyers. In the era of AI, incorporating ...

Tariq Khan

Director of Content Marketing

Debunking the most common AI myths
ai

Debunking the most common AI myths

ARTIFICIAL INTELLIGENCE CAN’T BE TRUSTED, shouts the headline on your social media newsfeed. Is that really true, or is ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How AI can benefit the retail industry
ai

How AI can benefit the retail industry

Artificial intelligence is on a roll. It’s strengthening healthcare diagnostics, taking on office grunt work, helping banks combat fraud ...

Catherine Dee

Search and Discovery writer

How ecommerce AI is reshaping business
e-commerce

How ecommerce AI is reshaping business

Like other modern phenomena such as social media, artificial intelligence has landed on the ecommerce industry scene with a giant ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

AI-driven smart merchandising: what it is and why your ecommerce store needs it
ai

AI-driven smart merchandising: what it is and why your ecommerce store needs it

Do you dream of having your own personal online shopper? Someone familiar and fun who pops up every time you ...

Catherine Dee

Search and Discovery writer

NRF 2024: A cocktail of inspiration and innovation
e-commerce

NRF 2024: A cocktail of inspiration and innovation

Retail’s big show, NRF 2024, once again brought together a wide spectrum of practitioners focused on innovation and transformation ...

Reshma Iyer

Director of Product Marketing, Ecommerce

How AI-powered personalization is transforming the user and customer experience
ai

How AI-powered personalization is transforming the user and customer experience

In a world of so many overwhelming choices for consumers, how can you best engage with the shoppers who visit ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show
algolia

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show

Get ready for an exhilarating journey into the future of retail as Algolia takes center stage at the NRF Retail ...

John Stewart

VP Corporate Marketing

How to master personalization with AI
ai

How to master personalization with AI

Picture ecommerce in its early days: businesses were just beginning to discover the power of personalized marketing. They’d divide ...

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

5 best practices for nailing the ecommerce virtual assistant user experience
ai

5 best practices for nailing the ecommerce virtual assistant user experience

“Hello there, how can I help you today?”, asks the virtual shopping assistant in the lower right-hand corner ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

You looked at this scarf twice; need matching mittens? How about an expensive down vest?

You watched this goofy flick four times? Try something else. I know what will hook you next.

That search term you’ve entered appears in a zillion spots in the company’s data silos. Based on my bag-of-words model, check out these similar documents I found that contain common words.

If a recommendation-engine algorithm could “think” as a human does, you might catch it making these kinds of private observations. Of course, these aren’t the ways observations about document similarity and related content or product suggestions are phrased on websites. All you get is a “You might also like” or a list of items with no indication as to why they were selected. Regardless, you’re served great recommendations for similar items or content that could very easily pique your interest, as though the algorithm has been taking notes. 

Big Data is listening

Similarity is a key differentiator in ranking search engine results and recommending content. If we as humans like something, it’s probably not enough; we want more, more, more. 

Ecommerce retailers are especially happy to indulge us with more-specialized information retrieval; they’re understandably passionate about meeting our similarity-seeking needs. User-based recommendations for similar items are everywhere on the Web, from Amazon to Netflix to large retailers’ sites. Most people are intrigued by their personalized “You loved that, you could love this” ideas, their “Buy it with” add-on suggestions, their ability to check out “Customers also considered” products.

Thanks to data scientists’ creation of reliable similar-content functionality, this experience is an everyday occurrence. But how do companies use data science to so uncannily figure out what else we might like? What’s involved in a website identifying the right similar items in a sea of options? How does a recommender system use artificial intelligence and tap a dataset to figure out what similar movie title, product, or blog post a user would want to see next?

The secret boils down to a time-tested measure of similarity between two number sequences: cosine similarity (Wikipedia definition).

What is cosine similarity?

In terms of language, cosine similarity determines the closeness in meaning between two or more words. It’s the way a search or recommendation engine knows, for example, that the word math is similar to statistics, is similar to machine learning models, is similar to cosines, all of which are not similar to scarf and mittens.

This distance-evaluation metric, also known as item-to-item similarity, calculates the similarity scores between two items in a vector residing in a multidimensional inner product space. This is made possible by vectorization, which converts words into vectors (numbers), allowing their meaning to be encoded and processed mathematically. Then the cosine of the angle between the two vector items as projected in the multidimensional space can be determined.

Here’s an example:

 

This diagram shows that woman and man are somewhat similar (as Mars and Venus would be), yet king and queen aren’t related, but king is related to man.

How does this measurement go about revealing the similarity between items? It works based on the principles of cosines: when cosine distance increases, the similarity of the data points decreases.

To measure the similarity of two items based on their attributes, cosine similarity is computed on a matrix like this. The output value ranges from 0–1.

The cosine computation across all of these values will produce the following possible outputs:

  • -1 (an opposite)
  •  0 (no relation)
  • 1 (100% related)

But the most telling values are the decimals in between the extremes, which indicate varying degrees of similarity. For example, if item 1 and item 2 have a .8 degree difference, that would make them far more similar to item 3, if item 3 has a .2 distance from both items 1 and 2. 

Here’s a mini tutorial with more details on how to compute cosine similarity.

The upshot: if two item vectors have many common attributes, the items are very similar.

Why cosine similarity?

In data analysis for recommendation systems, various similarity metrics, including Euclidean distance, Jaccard similarity, and Manhattan distance, are used for evaluating data points. But among the options, cosine similarity is considered the best and most common method.

Cosine similarity is a trusted form of measurement for a variety of reasons. For instance, even if two similar data objects are far apart in terms of Euclidean distance because of their size, they could still have a relatively small angle between them. And the smaller the angle, the stronger the similarity.

In addition, the cosine similarity formula is a winner because it can handle variable-length data, such as sentences, not just words.

Attesting to its popularity, cosine similarity is utilized in many online libraries and tools, such as TensorFlow, plus sklearn and scikit-learn for Python.

Cosine similarity and machine learning

Machine-learning algorithms are commonly applied to datasets in order to offer website users and shoppers the most on-point customized recommendations. This practice has taken off: deep-learning-generated recommendations for shoppers and media-site subscribers have become an integral part of the website search and discovery experience.

With similarity assessment, getting the semantics right is key, so natural language processing (NLP) plays a substantial role.

Consider the types of terms in the diagram — king, queen, ruler, monarchy, royalty. With vectors, computers can make sense of them by clustering them together in n-dimensional space. They can each be located with coordinates (x, y, z), and similarity can be calculated using distance and angles.

Machine learning models can then surmise that words that are near each other in vector space — such as king and queen — are related, and words that are even closer, such as queen and ruler, could be synonyms. 

Vectors can also be added, subtracted, and multiplied to establish meaning and relationships, and thereby provide more-accurate recommendations. One often-cited example of such addition and subtraction: king – man + woman = queen. Machines can use this type of formula to determine gender.

Applying the algorithm

At Algolia, our recommendations rely in part on supervised machine-learning models. Data is collected for a similarity matrix in which columns are userTokens and rows are objectIDs. Each cell represents the number of interactions (click and/or conversion) between a userToken and an objectID.

Then we apply a collaborative filtering algorithm that, for each item, finds other items that share similar buying patterns across customers. Items are similar if the same user set has interacted with them.

One challenge: the similarity matrix is computationally heavy (dense), and the similarity values are small, introducing noise to the data that can negatively impact the quality of the recommendations provided.

To get around this roadblock, the k-nearest neighbors algorithm (KNN) comes in handy. Cosine similarity determines the nearest neighbors. You get the optimal number of neighbors for which data points with higher similarity are considered nearest and those with lower similarity aren’t considered. You retain only the k most-similar couples of items. The result: high-quality suggestions.

Cosine similarity in a recommendation system

With movie recommendation systems, among other types of content-based recommendation systems, it’s all about the algorithms. 

What do similar users watch (or read or listen to)? Cosine similarity measures the similarity between two viewers — that is, one user profile vs. all the others.

What else do people who view or buy this item buy? In the recommendation-generating process, item descriptions and attributes are leveraged in order to calculate item similarity. Using cosine similarity, the degree of sameness between what the person has selected or viewed compared with other items in the catalog is assessed. The other items with the highest similarity values are presented as the most promising recommendations.

Cosine similarity is instrumental in recommending the right text documents, too. For instance, it can help answer questions like:

For text similarity, frequently occurring terms are key. The terms are vectorized, and for recommendations, those with the higher frequencies are considered the strongest.

query cat infographic banner

If you like this post, you may like Algolia

Want to offer your search-engine users or customers the best algorithmically calculated personalized suggestions for similar items? Check out Algolia Recommend.

Regardless of your use case, your developers can take advantage of our API to build the recommendations experiences best suited to your needs. Our recommendation algorithm applies content-based filtering to enhance your user engagement and inspire visitors to come back. That’s good news for conversion and your bottom line

Get a customized demo, try us free, or chat with us soon about high-quality similar-content suggestions that are bound to resonate with your customer base. We’re looking forward to hearing from you!

About the author
Vincent Caruana

Senior Digital Marketing Manager, SEO

Recommended Articles

Powered byAlgolia Algolia Recommend

Semantic textual similarity: a game changer for search results and recommendations
product

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

The anatomy of high-performance recommender systems – Part IV
ai

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

The anatomy of high-performance recommender systems - Part 1
ai

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI