Search by Algolia
Introducing new developer-friendly pricing
algolia

Introducing new developer-friendly pricing

Hey there, developers! At Algolia, we believe everyone should have the opportunity to bring a best-in-class search experience ...

Nick Vlku

VP of Product Growth

What is online visual merchandising?
e-commerce

What is online visual merchandising?

Eye-catching mannequins. Bright, colorful signage. Soothing interior design. Exquisite product displays. In short, amazing store merchandising. For shoppers in ...

Catherine Dee

Search and Discovery writer

Introducing the new Algolia no-code data connector platform
engineering

Introducing the new Algolia no-code data connector platform

Ingesting data should be easy, but all too often, it can be anything but. Data can come in many different ...

Keshia Rose

Staff Product Manager, Data Connectivity

Customer-centric site search trends
e-commerce

Customer-centric site search trends

Everyday there are new messages in the market about what technology to buy, how to position your company against the ...

Piyush Patel

Chief Strategic Business Development Officer

What is online retail merchandising? An introduction
e-commerce

What is online retail merchandising? An introduction

Done any shopping on an ecommerce website lately? If so, you know a smooth online shopper experience is not optional ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

5 considerations for Black Friday 2023 readiness
e-commerce

5 considerations for Black Friday 2023 readiness

It’s hard to imagine having to think about Black Friday less than 4 months out from the previous one ...

Piyush Patel

Chief Strategic Business Development Officer

How to increase your sales and ROI with optimized ecommerce merchandising
e-commerce

How to increase your sales and ROI with optimized ecommerce merchandising

What happens if an online shopper arrives on your ecommerce site and: Your navigation provides no obvious or helpful direction ...

Catherine Dee

Search and Discovery writer

Mobile search UX best practices, part 3: Optimizing display of search results
ux

Mobile search UX best practices, part 3: Optimizing display of search results

In part 1 of this blog-post series, we looked at app interface design obstacles in the mobile search experience ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Mobile search UX best practices, part 2: Streamlining search functionality
ux

Mobile search UX best practices, part 2: Streamlining search functionality

In part 1 of this series on mobile UX design, we talked about how designing a successful search user experience ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Mobile search UX best practices, part 1: Understanding the challenges
ux

Mobile search UX best practices, part 1: Understanding the challenges

Welcome to our three-part series on creating winning search UX design for your mobile app! This post identifies developer ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Teaching English with Zapier and Algolia
engineering

Teaching English with Zapier and Algolia

National No Code Day falls on March 11th in the United States to encourage more people to build things online ...

Alita Leite da Silva

How AI search enables ecommerce companies to boost revenue and cut costs
ai

How AI search enables ecommerce companies to boost revenue and cut costs

Consulting powerhouse McKinsey is bullish on AI. Their forecasting estimates that AI could add around 16 percent to global GDP ...

Michelle Adams

Chief Revenue Officer at Algolia

What is digital product merchandising?
e-commerce

What is digital product merchandising?

How do you sell a product when your customers can’t assess it in person: pick it up, feel what ...

Catherine Dee

Search and Discovery writer

Scaling marketplace search with AI
ai

Scaling marketplace search with AI

It is clear that for online businesses and especially for Marketplaces, content discovery can be especially challenging due to the ...

Bharat Guruprakash

Chief Product Officer

The changing face of digital merchandising
e-commerce

The changing face of digital merchandising

This 2-part feature dives into the transformational journey made by digital merchandising to drive positive ecommerce experiences. Part 1 ...

Reshma Iyer

Director of Product Marketing, Ecommerce

What’s a convolutional neural network and how is it used for image recognition in search?
ai

What’s a convolutional neural network and how is it used for image recognition in search?

A social media user is shown snapshots of people he may know based on face-recognition technology and asked if ...

Catherine Dee

Search and Discovery writer

What’s organizational knowledge and how can you make it accessible to the right people?
product

What’s organizational knowledge and how can you make it accessible to the right people?

How’s your company’s organizational knowledge holding up? In other words, if an employee were to leave, would they ...

Catherine Dee

Search and Discovery writer

Adding trending recommendations to your existing e-commerce store
engineering

Adding trending recommendations to your existing e-commerce store

Recommendations can make or break an online shopping experience. In a world full of endless choices and infinite scrolling, recommendations ...

Ashley Huynh

Looking for something?

Cosine similarity: what is it and how does it enable effective (and profitable) recommendations?

Feb 22nd 2023 ai

Cosine similarity: what is it and how does it enable effective (and profitable) recommendations?
facebookfacebooklinkedinlinkedintwittertwittermailmail

You looked at this scarf twice; need matching mittens? How about an expensive down vest?

You watched this goofy flick four times? Try something else. I know what will hook you next.

That search term you’ve entered appears in a zillion spots in the company’s data silos. Based on my bag-of-words model, check out these similar documents I found that contain common words.

If a recommendation-engine algorithm could “think” as a human does, you might catch it making these kinds of private observations. Of course, these aren’t the ways observations about document similarity and related content or product suggestions are phrased on websites. All you get is a “You might also like” or a list of items with no indication as to why they were selected. Regardless, you’re served great recommendations for similar items or content that could very easily pique your interest, as though the algorithm has been taking notes. 

Big Data is listening

Similarity is a key differentiator in ranking search engine results and recommending content. If we as humans like something, it’s probably not enough; we want more, more, more. 

Ecommerce retailers are especially happy to indulge us with more-specialized information retrieval; they’re understandably passionate about meeting our similarity-seeking needs. User-based recommendations for similar items are everywhere on the Web, from Amazon to Netflix to large retailers’ sites. Most people are intrigued by their personalized “You loved that, you could love this” ideas, their “Buy it with” add-on suggestions, their ability to check out “Customers also considered” products.

Thanks to data scientists’ creation of reliable similar-content functionality, this experience is an everyday occurrence. But how do companies use data science to so uncannily figure out what else we might like? What’s involved in a website identifying the right similar items in a sea of options? How does a recommender system use artificial intelligence and tap a dataset to figure out what similar movie title, product, or blog post a user would want to see next?

The secret boils down to a time-tested measure of similarity between two number sequences: cosine similarity (Wikipedia definition).

What is cosine similarity?

In terms of language, cosine similarity determines the closeness in meaning between two or more words. It’s the way a search or recommendation engine knows, for example, that the word math is similar to statistics, is similar to machine learning models, is similar to cosines, all of which are not similar to scarf and mittens.

This distance-evaluation metric, also known as item-to-item similarity, calculates the similarity scores between two items in a vector residing in a multidimensional inner product space. This is made possible by vectorization, which converts words into vectors (numbers), allowing their meaning to be encoded and processed mathematically. Then the cosine of the angle between the two vector items as projected in the multidimensional space can be determined.

Here’s an example:

 

This diagram shows that woman and man are somewhat similar (as Mars and Venus would be), yet king and queen aren’t related, but king is related to man.

How does this measurement go about revealing the similarity between items? It works based on the principles of cosines: when cosine distance increases, the similarity of the data points decreases.

To measure the similarity of two items based on their attributes, cosine similarity is computed on a matrix like this. The output value ranges from 0–1.

The cosine computation across all of these values will produce the following possible outputs:

  • -1 (an opposite)
  •  0 (no relation)
  • 1 (100% related)

But the most telling values are the decimals in between the extremes, which indicate varying degrees of similarity. For example, if item 1 and item 2 have a .8 degree difference, that would make them far more similar to item 3, if item 3 has a .2 distance from both items 1 and 2. 

Here’s a mini tutorial with more details on how to compute cosine similarity.

The upshot: if two item vectors have many common attributes, the items are very similar.

Why cosine similarity?

In data analysis for recommendation systems, various similarity metrics, including Euclidean distance, Jaccard similarity, and Manhattan distance, are used for evaluating data points. But among the options, cosine similarity is considered the best and most common method.

Cosine similarity is a trusted form of measurement for a variety of reasons. For instance, even if two similar data objects are far apart in terms of Euclidean distance because of their size, they could still have a relatively small angle between them. And the smaller the angle, the stronger the similarity.

In addition, the cosine similarity formula is a winner because it can handle variable-length data, such as sentences, not just words.

Attesting to its popularity, cosine similarity is utilized in many online libraries and tools, such as TensorFlow, plus sklearn and scikit-learn for Python.

Cosine similarity and machine learning

Machine-learning algorithms are commonly applied to datasets in order to offer website users and shoppers the most on-point customized recommendations. This practice has taken off: deep-learning-generated recommendations for shoppers and media-site subscribers have become an integral part of the website search and discovery experience.

With similarity assessment, getting the semantics right is key, so natural language processing (NLP) plays a substantial role.

Consider the types of terms in the diagram — king, queen, ruler, monarchy, royalty. With vectors, computers can make sense of them by clustering them together in n-dimensional space. They can each be located with coordinates (x, y, z), and similarity can be calculated using distance and angles.

Machine learning models can then surmise that words that are near each other in vector space — such as king and queen — are related, and words that are even closer, such as queen and ruler, could be synonyms. 

Vectors can also be added, subtracted, and multiplied to establish meaning and relationships, and thereby provide more-accurate recommendations. One often-cited example of such addition and subtraction: king – man + woman = queen. Machines can use this type of formula to determine gender.

Applying the algorithm

At Algolia, our recommendations rely in part on supervised machine-learning models. Data is collected for a similarity matrix in which columns are userTokens and rows are objectIDs. Each cell represents the number of interactions (click and/or conversion) between a userToken and an objectID.

Then we apply a collaborative filtering algorithm that, for each item, finds other items that share similar buying patterns across customers. Items are similar if the same user set has interacted with them.

One challenge: the similarity matrix is computationally heavy (dense), and the similarity values are small, introducing noise to the data that can negatively impact the quality of the recommendations provided.

To get around this roadblock, the k-nearest neighbors algorithm (KNN) comes in handy. Cosine similarity determines the nearest neighbors. You get the optimal number of neighbors for which data points with higher similarity are considered nearest and those with lower similarity aren’t considered. You retain only the k most-similar couples of items. The result: high-quality suggestions.

Cosine similarity in a recommendation system

With movie recommendation systems, among other types of content-based recommendation systems, it’s all about the algorithms. 

What do similar users watch (or read or listen to)? Cosine similarity measures the similarity between two viewers — that is, one user profile vs. all the others.

What else do people who view or buy this item buy? In the recommendation-generating process, item descriptions and attributes are leveraged in order to calculate item similarity. Using cosine similarity, the degree of sameness between what the person has selected or viewed compared with other items in the catalog is assessed. The other items with the highest similarity values are presented as the most promising recommendations.

Cosine similarity is instrumental in recommending the right text documents, too. For instance, it can help answer questions like:

For text similarity, frequently occurring terms are key. The terms are vectorized, and for recommendations, those with the higher frequencies are considered the strongest.

If you like this post, you may like Algolia

Want to offer your search-engine users or customers the best algorithmically calculated personalized suggestions for similar items? Check out Algolia Recommend.

Regardless of your use case, your developers can take advantage of our API to build the recommendations experiences best suited to your needs. Our recommendation algorithm applies content-based filtering to enhance your user engagement and inspire visitors to come back. That’s good news for conversion and your bottom line

Get a customized demo, try us free, or chat with us soon about high-quality similar-content suggestions that are bound to resonate with your customer base. We’re looking forward to hearing from you!

About the author
Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Recommended Articles

Powered byAlgolia Algolia Recommend

The anatomy of high-performance recommender systems – Part IV
ai

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

The anatomy of high-performance recommender systems - Part 1
ai

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

What is vector search?
ai

Dustin Coates

Product and GTM Manager