AI

The anatomy of high-performance recommender systems – Part IV
facebooklinkedintwittermail

Popular models and techniques for recommender systems

In the first part of this series on recommendations, we talked about the key components of a high-performance recommender system: (1) Data Sources, (2) Feature Engineering and Feature Store, (3) Machine Learning Models, (4 & 5) Predictions & Actions, (6) Results, (7) Evaluation, and (8) AI Ethics. The focus here is on 3 — Machine Learning as applied to Recommendations Models.

In our previous article on features recommendations, we learned that the most popular techniques used for recommender systems rely on three data source types: (1) user metadata — gender, age, occupation, location, etc., (2) attribute information about items — a merchant selling books may have descriptions of the books and keywords describing the content, title, and author, and (3) user-item interactions — click, views, transactions, reviews, etc. 

In this article, we’ll be building the recommendation models themselves.

We categorize common recommendation approaches based on the way we handle these raw data-sources: 

  • Content-Based — content plays a primary role in the recommendation process. As we’ll see, item descriptions and attributes are leveraged in order to calculate item similarity.
  • Collaborative Filtering — these models use the collaborative power of ratings provided by multiple users to make recommendations. The basic idea is that unspecified ratings can be computed because observed ratings are often highly correlated across various users and items. There are two main types of collaborative filtering:
    • User-Based — the main idea behind user-based collaborative filtering is that people with similar characteristics share similar tastes.
    • Item-Based — as opposed to user-based, item-to-item collaborative filtering is based on the similarity between items calculated using people’s ratings of those items.

Let’s dive deeper into each one of these techniques. 

Content-based recommendations models

In content-based filtering we recommend items based on their attributes. Thus, the input to a content-based recommender system consists of all the attributes described in the items content matrix. 

The main assumption in content-based recommenders is that a user who expressed a preference for an item is likely to prefer similar items. For example, if I happen to enjoy Marvel movies and especially Ryan Reynolds in Green Lantern back in 2011, chances are I would enjoy Deadpool (2016 and 2018) … not to mention Deadpool 3 (rumour has it they’re starting production in 2022, yey!) because they share the same actor and they’re the same movie genre.

The Item Content Metrics (ICM) best represents the attributes of a pool of items, as seen below:

recommendations-user-rating-matrix
“1” if the item has the attribute “A”, “0” otherwise.

Now, the key question is: how do we measure the similarity of two items based on their attributes? For that, we compute cosine similarity on a matrix such as the one below:

Attribute 1 Attribute 2 Attribute N
Item 1 1 0 1
Item 2 0 1 0
1 0 1
Item N 0 1 0

The intuition is that if two item vectors have many attributes in common, we can assume that the two items are very similar. Mathematically, that can be expressed using the dot product operation:

recommendations-math

And geometrically, it can be expressed as the cosine between the two vectors:

soft-cosine

As can be seen, a smaller angle (larger cosine) means that the items have a higher degree of similarity. There are two other important concepts that will help us build the similarity matrix: 

  • Support — the number of non-zero elements in the vector. 
  • Shrink term C — used to adjust the similarity by taking into account only the most similar with large support items. 

With this, the similarity matrix is computed using the formula:

recommmendations-matrix

Below, you can see how we compute recommendations based on the well known MovieLens dataset that consists of movie titles and user-provided tags:

First, we compute the cosine similarity between all the samples in the matrix and we save it in a new data frame that will be later used for returning top k similar movies:

# creating the tf-idf Vectorizer to analyze, at word level, unigrams and bigrams
tf = TfidfVectorizer(analyzer='word', ngram_range=(1,2)) 
 
# applying the vectorizer on the 'tags' column
tfidf_matrix = tf.fit_transform(movies_df['tags'])
 
# compute the cosine similarity between all the samples in the matrix
cosine_sim = cosine_similarity(tfidf_matrix) 
 
# saving the values in a DataFrame for better visualisation
cosine_sim_df = pd.DataFrame(cosine_sim, index=movies_df['title'], columns=movies_df['title']) 
def get_recommendations(movie_id, similarity_df, movies_df, k=10):
 
# partitions the matrix such that the indices are in the position they would be in a sorted array
ix = similarity_df.loc[:,movie_id].to_numpy().argpartition(range(-1,-k,-1))
 
# gets the corresponding movie titles for k+1 sorted indices    
closest = similarity_df.columns[ix[-1:-(k+2):-1]] 
 
# removes the queried title from the results
closest = closest.drop(movie_id, errors='ignore')    
 
# returns top k most similar movies
return pd.DataFrame(closest).merge(movies_df).head(k) 

And finally, getting recommendations is as simple as:

x = 'Superman (1978)'

# get recommendations for the given x title
y = get_recommendations(x, cosine_sim_df, movies_df[['title', 'tags']])

This basic content-based recommendation algorithm will produce the following results:

recommendations-results

The issue with the similarity matrix is that it’s very dense (computational heavy) and the similarity values are very small, introducing a lot of noise in the data, hence lowering the quality of recommendations. To solve this problem, we can use the K-Nearest Neighbors algorithm (KNN) — a method that consists in keeping only the K-most similar couples of items in the similarity matrix. 

from sklearn.neighbors import NearestNeighbors
 
# Create the unsupervised learner with k=10 and train it on our data
knn = NearestNeighbors(n_neighbors=10).fit(cosine_sim_df)
 
def get_recommendations(movie_title, similarity_df, movies_df):
 
# Get the indices of the neighbors of the queried movie
ix = knn.kneighbors(similarity_df[movie_title].to_numpy().reshape(1, -1), return_distance=False)
 
# Get the titles of the corresponding indices
closest = similarity_df.columns[ix].flatten()
 
return pd.DataFrame(closest,columns=['title']).merge(movies_df)

The same query, using the same code as above, will produce the following results:

recommendations-results

In the basic example above, we’ve taken into consideration only 1 attribute, namely tags. The content-based algorithm can be greatly improved by including more non-binary attributes such as title, actors, genre, etc. 

Collaborative filtering for recommendations models

The main idea of the Collaborative Filtering method is to infer the ratings a user has not explicitly given to the items by observing ratings from other highly correlated users and items where ratings have been given. They do not require any item attribute to work, but rely on the opinions of a community of users.

The input to a Collaborative Filtering recommender system is the User-Rating Matrix (URM), which contains all the user ratings on the items:

the rating user U gave to item I, “0” if no rating was given

Interactions in the user rating matrix can be, for instance, explicit ratings. Explicit rating means that the user explicitly stated his or her opinion with an item. An example of an explicit rating can be the number of stars a user gave to an item. If there is no rating from a user for an item, the corresponding value in the URM is 0. 

Item 1 Item 2 Item N
User 1 1 3 1
User 2 2 4 4
User N 3 5 2

Another type of interaction described in the URM can be implicit ratings. Implicit ratings simply state whether the user interacted with an item and therefore has only two values: 0 or 1. One means the user interacted with the item, while zero means no information. 

In principle, it’s possible to design an algorithm that can use both explicit and implicit ratings. The simplest algorithms are designed to work only with one of the two types: implicit or explicit ratings. 

Although there are different types of collaborative filtering techniques, below we’ll cover user-based and item-based techniques.

User-Based Collaborative Filtering

When we want to make recommendations to a user, user-based collaborative filtering can be used to search for other users with similar tastes and recommend the items those users like the most. 

Thus, the first problem is to find a way to measure the similarity between users. How can we measure this similarity?

We start by looking at the ratings of the users and compare user tastes based on those ratings. If two users have given similar ratings to several items, we can assume that those two users have the same opinion on those items; and if they have the same opinions on a lot of items, we can assume that the two users are similar.

We compute the similarity between users based on their ratings, and we create a similarity matrix, where the value of row i and column j is the similarity score between user i and user j.

To compute the similarity between two users, we use the cosine similarity. For computational purposes we will compute the similarities between a user and all the other users at the same time:

 

# create URM
 
URM = np.zeros((num_users, num_items), dtype=int)
 
for _, row in data.iterrows():
   URM[row['user_id'] - 1, row['item_id'] - 1 ] = row['rating']
# find the most N similar users to the current one (we picked N to be 3)
 
def similar_users(user_id, URM, N = 3):
   number_of_users = URM.shape[0]
 
# create the list of other users (take all users ids and then remove the id for the current one)
   other_users_ids = np.array(range(number_of_users), dtype=int)
   other_users_ids = np.delete(other_users_ids, user_id).tolist()
 
# take the ratings made by the current user
   user = URM[user_id].tolist()
 
# iterate through all the other users ids and compute the similarity to the current user
   other_users_similarity = []
 
   for other_user_id in other_users_ids:
     other_user = URM[other_user_id].tolist()
     other_users_similarity.append(cosine_similarity_pair_of_users(user, other_user))
 
# create a dictionary with the user indexes and their similarity
   index_similarities = dict(zip(other_users_ids, other_users_similarity))
 
# sort by the similarity
   index_similarity_sorted = sorted(index_similarities.items(), key=lambda x: x[1], reverse=True)
 
# get the first N users and their similarity
   top_user_similarity = index_similarity_sorted[:N]
 
# get the ids for the top N users
   top_user_ids = [usr[0] for usr in top_user_similarity]
 
   return top_user_ids


At this point, identifying the similar user for a given user_id is straightforward:

# for example we take the user with the id 48
example_user_id = 48
top_similar_users = similar_users(example_user_id, URM)

The next step is to actually define the top K items to be recommended to our user. Here’s the code to do exactly that, once we have the top similar users identified:

def recommend_topK_items_for_user(user_id, similar_users_ids, URM, topK = 5):
 
   # get the similar users ratings
   similar_users = URM[similar_users_ids]
 
   # compute the average rating for every item rated by the similar users
   average_ratings = similar_users.mean(axis=0)
  
   # get the current user
   user = URM[user_id]
 
   # transpose it
   user_transposed = user.transpose()
 
   # get the items ids where the rating is 0 (the user didn't rate the item)
   items = np.where(user_transposed == 0)[0]
 
   # get the similar users ratings for the unrated items by the current user
   similar_users_ratings = similar_users[:, items]
   average_ratings = average_ratings[items]
 
   # order the items by the average of their ratings
   item_indexes = np.argsort(average_ratings)[::-1]
 
   # get the top k items indexes
   top_items_index = item_indexes[:topK]
 
   return top_items_index.tolist()

Final step is generating the user-based recommendations and comparing them with the rated movies:

recommended_items = recommend_topK_items_for_user(example_user_id, top_similar_users, URM)
 
# items rated by the user
user_ratings = URM[example_user_id]
items_rated = np.where(user_ratings != 0)[0].tolist()
 
items_info.loc[items_info['movie_id'].isin(items_rated)]
 
# recommended items for user
items_info.loc[items_info['movie_id'].isin(recommended_items)]

Item-Based Collaborative Filtering

In an item-based collaborative filtering approach, in order to predict a user’s ratings for a target item, we have to determine the set of items that are most similar to the target item. The idea is to calculate the similarity between each pair of items according to how many users have rated them both. Then we use ratings specified by the user in that item to predict if he or she will like the target items.

recommendations-item-based

Thus, the first step is to calculate the similarity between items as shown below:

def similar_items(item_id, URM, topK = 5):
   # get the current item
   item = URM[:, item_id].tolist()
 
   # get all the other items
   number_of_items = URM.shape[1]
   other_items_ids = np.array(range(number_of_items), dtype=int)
   other_items_ids = np.delete(other_items_ids, item_id).tolist()
 
   # compute the similarity between the current item and the other items
   other_items_similarity = []
 
   for other_item_id in other_items_ids:
     other_item = URM[:, other_item_id].tolist()
     other_items_similarity.append(cosine_similarity_pair_of_items(item, other_item))
 
   # create a dictionary with the item indexes and their similarity
   index_similarities = dict(zip(other_items_ids, other_items_similarity))
 
   # sort by the similarity
   index_similarity_sorted = sorted(index_similarities.items(), key=lambda x: x[1], reverse=True)
 
   # get the first K items and their similarity
   top_item_similarity = index_similarity_sorted[:topK]
 
   # get the ids for the top K items
   top_item_ids = [itm[0] for itm in top_item_similarity]
 
   return top_item_ids

Just like with user-based collaborative filtering, at this stage, similarity between items can be calculated as follows:

example_item_id = 29
recommended_items = similar_items(example_item_id, URM)

And finally, getting recommendations:

example_item_id = 29
# recommended items
items_info.loc[items_info['movie_id'].isin(recommended_items)]

After seeing how collaborative filtering works, you might still have a few pending questions, such as: 

  • What is the difference between item-based and user-based recommendations?
  • What is the difference between content-based and item-based recommendations?

To answer these questions, we need to look at how similarity is calculated — the cosine similarity formula can be adapted to calculate the similarity between users and the similarity between items starting from the user rating matrix or URM.

Considering the similarity between items, the approach to the problem is the same as what we’ve seen with content-based filtering techniques. In content-based filtering, we can recommend items to the user based on what he or she has previously liked. The similarity is measured by taking into account attributes. If two items have a lot of attributes in common, we can say that these two items are similar. 

Can we measure the similarity between the items without knowing the attributes?

Yes, we don’t need to know anything about the attributes of the items as we can compute our similarity matrix knowing the ratings coming from the user rating matrix or URM.

Further classification of recommendation approaches

Memory-based vs. Model-based

Up to now, we’ve used memory-based techniques for our collaborative filtering algorithms, in which we predict ratings on the basis of user similarity. They use the ratings of the user rating matrix or URM directly in production and they are easier to implement. 

However, there is a second technique commonly used in collaborative filtering: model-based methods. 

Model-based techniques do not rely on the whole data set when recommendations are computed, but they extract information from the data set to build a model. These techniques need two steps for the prediction: (1) the first step is to build the model; (2) the second step is to estimate ratings using a function that takes as inputs the model built in the first step, and the user profile.

 

 

What happens if we have a new user in the system?

In the case of memory-based techniques, we need to add the new user to the URM and recompute the similarity between the new user and all the other users. This operation is computationally heavy and we can make recommendations only to users present in the model. 

In the case of model-based techniques, we don’t need to add the user to the URM, if the URM is big enough. We don’t have to recompute the similarity matrix; it can be updated once a day, or every week or month. Therefore, we can make recommendations even to users who are not in the model.

Hybridization

At a minimum, a hybrid recommender system combines one collaborative filtering technique with one content-based technique. However, we can build a hybrid recommender system by putting together as many techniques as we want.

There are dozens of different approaches we can use to build a hybrid recommender system and all these approaches can be roughly classified into five families:

  • Linear Combination
    In a linear combination, we first estimate the ratings for a user by using algorithm A. Then we compute the estimated ratings using algorithm B. Finally, we compute the weighted sum between the two estimated ratings.

    In this type of hybrid system, each algorithm is used in an off-the-shelf fashion, which means that each algorithm is used as-is, without any modification. In fact, only their output is used to provide a combined prediction. This construction allows an easy combination of the algorithms and their parallel execution. As an example: we can build the combination of a content-based algorithm, A, with a collaborative algorithm, B. Note the different types of input data that are used: an ICM for algorithm A, a URM for algorithm B.

  • List Combination
    Assuming that we have 2 recommendation lists, each produced by a different algorithm, what we can do is take only the highest-ranked items from each list and merge them. One simple yet effective approach is to use round-robin: items are selected starting from the top of each list in a circular fashion.

  • Pipelining
    Pipelining consists in chaining two or more algorithms, so that the output ratings of one algorithm is fed as input into the next algorithm in the pipeline. For example, let’s consider a generic recommendation algorithm A. This algorithm can take as input data either a URM or an ICM.After building the model, we are able to compute some estimated ratings. In the pipelining approach, the ratings estimated with algorithm A can be used to enrich an existing user-rating matrix. This enriched user-rating matrix becomes the input for a second algorithm B.The result is the final matrix of estimated ratings. The output of algorithm A must be computed by using the user profiles taken from the URM that we wish to enrich for algorithm B. A practical application of pipelining is to use a content-based filtering algorithm as the first stage to enrich a user rating matrix, which is later used by a collaborative filtering algorithm.

  • Merging
    In some cases, the model of algorithm A can be combined with the model of algorithm B, to obtain a new model that is used for recommendation. We can use this approach only if the models have the same structure. For example, we can merge ACF item-based algorithm A with a CBF algorithm B: the first model receives as input the URM, and the second one takes the ICM. The important thing is that both models are of the same type. It is therefore not possible to merge a similarity matrix of a collaborative algorithm with the factor matrices of a matrix factorization algorithm.

  • Co-Training
    The last hybrid approach is based on co-training. It is similar to the merging method, the differences being that with co-training, the merged model is obtained directly during the training phase and the algorithms A and B must be trained together.

Factorization machines

The Factorization Machines algorithm is a general-purpose supervised learning algorithm that can be used for both classification and regression tasks, making it attractive for a collaborative filtering approach. The input for our algorithm is structured as a table, in which all the columns, except the last one, are divided into groups. For example, we have two groups: users and items. Each group corresponds to one dimension of the URM. These columns represent the features of the data set. The last column contains the ratings of the users for items, which makes it the target to be predicted in a recommender system scenario.

 

In order to represent users and items, we use One Hot Encoding. Using this method, we assign a column to every user and item. By setting the proper column to 1 and all the others to 0, we can select a specific user and a specific item. For example, the first column could be the first user in the URM, the second column could be the second user, and so on. By setting the first column to one and the others to zero, we select the first user. Similarly, in the item group, if we set the first column to one, and others to zero, we specify the first item.

Compared to the URM, this format uses a larger table with more zeroes and this makes the table more sparse than the URM. According to this representation, for each interaction in the URM we insert a row in the table. For instance, if the URM has 1000 interactions, we should insert 1000 rows in our table. Once we have defined the data structure of factorization machines, we can start studying the model used to estimate unknown ratings. 

How to choose the best recommendation technique?

As much as we’d like to have one general recommendation system that fits all real life scenarios, it’s more realistic to get into the mindset of having a family of recommendation models, each one built for a specific purpose.

For example, if you’re an eCommerce business, you might want to use collaborative filtering techniques to offer item-based recommendations, just like we offer with Algolia Recommend. However, if you’re a media company with a portfolio of digital publications, offering content-based recommendations might make more sense.

Or, in case you’re looking for personalized recommendations, then user-based collaborative filtering is the way to go. Also, deciding on model-based vs. memory-based techniques is more of a technical consideration which comes into play when scalability becomes an issue.

Most of the time a hybrid approach is the best strategy and clearly the way to go for companies that have a machine learning team capable of iterating and fine tuning recommender models.

Recommender Systems Cheat Sheet

In the next blog post in this series we’ll focus on the actions various recommender systems can facilitate via the predictions they’re generating. Stay tuned! And if you have any questions: https://twitter.com/cborodescu

About the authorCiprian Borodescu

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

Recommended Articles

Powered by Algolia AI Recommendations

The anatomy of high-performance recommender systems - Part 1
AI

The anatomy of high-performance recommender systems - Part 1

Ciprian Borodescu

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI
What is a recommender system (or recommendation engine)?
Product

What is a recommender system (or recommendation engine)?

Catherine Dee

Catherine Dee

Search and Discovery writer
Cosine similarity: what is it and how does it enable effective (and profitable) recommendations?
AI

Cosine similarity: what is it and how does it enable effective (and profitable) recommendations?

Vincent Caruana

Vincent Caruana

Senior Digital Marketing Manager, SEO