AI-powered recommendations for media, books, blogs

You’re on a blog surfing for articles to read. The blog displays a list of enticing recommendations to help you choose. Your choices feel unpredictable and driven by, among other things, your current mood – which changes with every new article you read. What are the odds the machine recommends the best articles? What amount of past history (yours and collectively others) could possibly teach a machine how to predict a future subjective choice based on taste? How can a machine anticipate, or participate in, spontaneity?

We can ask the same questions about recommending a movie to watch or a podcast to listen to.

To help us answer these questions, one of our ML/AI engineers used our Recommendation API to develop an app that proposes next-articles-to-read for readers of a technical blog. The scenario goes like this: When an engineer finishes reading an article, the recommender system displays the most relevant articles to read next. The app pulls its recommendations from a dataset of related articles generated by ML models that Algolia Recommend built based on the prior collective activity of all readers of the blog.

The goal here is about raising questions and providing a frame in which to answer them. It’s also about about the simplicity of implementing and evaluating our recommendations. While we often talk about recommendations for ecommerce, our API can handle any use case. As discussed below, the important part is capturing the right signals and “scoring” the relevance of each signal to determine the strength of the relationship between two or more articles.

Combining our Recommendations and Clicks & Conversions Insights APIs, we were able to dive deep into answering questions like the following:

Are these recommendations “accurate”? “relevant”?
Curiosity – Are they surprising enough for the curious reader to click on? Is every reader open and curious about their next choice, or are they single-focused, looking for something specific?
Insightful – Are the recommendations too obvious? Are they diverse or redundant?
Serendipity – Can a machine, in its naiveté, create serendipitous moments, where the system recommends great unknown unknowns that the reader would never have thought about?
Random – Are the recommendations odd, banal, random, useless?

While the experiment did not answer all of these questions, nor succeed in creating an app that can see into the future, we do believe it confirmed that one can, on both a conceptual and concrete level, build a content/media-based website that provides relevant and useful recommendations to its readers.

Building AI-powered recommendations for reading

Recommending books is tricky. It takes a very intuitive and adept bookseller or librarian. Normally, they start with a few questions about intent – What genre of book or author are you looking for? Is it fiction or non-fiction you seek? What subject? Then they move to preferences – questions of taste, style, era, genre. The best advisors listen closely while consulting their inner “database” of experience and knowledge. They get a sense of the reader and how they frame their query. With all this, the personal advisor is ready to recommend the “right” or “best” books. Will their recommendations be perfect? Often, yes. Or at least inspiring, insightful. You can never replace the human touch… Or can you?

How does a recommender system compete with this personal touch? For example, a person has just read Moby Dick and wants to read something else. What’s the next book?

History of Whaling Industry in the 19th century
More novels from the 19th century
Other books by Herman Melville, the author of Moby Dick
Homer’s Odyssey or The Life of Pi (other sea adventures)
Something about smaller fish, like River runs through it
Another type of adventure (The Harry Potter series)
Travel books – Travel the high seas
Or something completely different and funny, to shed the oppressive weight of Moby Dick

Who knows? There are too many possibilities. It seems random or a shot in the dark to recommend anything. Sometimes even the reader doesn’t know… And yet, that’s where it gets interesting:

When recommending books, booksellers and a good recommender system are not solely concerned about suggesting books, they want to inspire us to make our own best choice about which book to read next.

Enough said of the challenges. Now how did we face those challenges?

How Recommender systems calculate the best recommendations

Let’s start by comparing three different use cases:

A clearly-defined readership: students on an online library research platform
A larger but fairly homogenous readership: readers of a technical blog
A much larger heterogeneous readership: readers of a general purpose blog

We examine each scenario, but only experiment with the second. We’ll use the first and last to lay down the foundation for the third – a technical blog. Each scenario gives a recommender system more or less context. So we’ll see how important context is in avoiding randomness. But first, let’s look at the heuristic of scoring that can be used to help build the models.

The heuristic of scoring books based on user reading patterns

To replace the verbal chat between a reader and a bookseller, an online system must leverage user signals that can communicate intent and preference. This is done with clicks (what users click on before, during, and after they read) and conversions (what users read or how long they read). For example, if many users read articles X and Y in the same reading session, we can conclude two things:

These readers had a strong intent to read these two articles together
And therefore, articles X and Y are somehow related.

Another example is when a user clicks on a link inside an article, reads the linked article, and then returns to continue reading the first article. This behavior shows intent that these articles are related.

These conclusions about intent are heuristics. Let’s go further, let’s give certain actions more weight: If we believe that multiple clicks on links within the same article, with long reading times for each click, indicates a strong intent of the reader to consider all these articles as related, then we can be confident in giving this set of actions more weight than when someone surfs with shorter reading times. This is another heuristic – to see what behaviors show strong or weak intent, and to weigh or score them accordingly.

This is how you can work with our Algolia Recommend product. Algolia customers can consider several factors before sending click and conversions to our Recommend product. For example, they can add or remove the strength of a signal by either not sending the signal, or sending it once or twice depending on its strength.

One danger, however, with heuristics is that if the underlying assumption about the signal is false, then doubling the score of an already incorrect interpretation doubles the mistake. This is the gamble.

Context plays a role in recognizing intent

Let’s go back to two of the above contexts:

The educational context (a student library)
The random public (a general-purpose blog)

Scenario 1 – Creating recommendations for students

Students offer a clearly defined and delimited universe that gives a recommender system a good head start. Students have a narrow set of choices. Their intentions are clear and lead to a similar goal as other readers: to learn about a particular subject.

A library’s online catalog can display “Recommended Next Books” and “Books Less likely to help” with confidence. They can make it especially accurate by embedding numerous related-article hyperlinks within the texts of each article. (Note, these hyperlinks can be refreshed over time using the recommender system itself.)

Clicks & Conversions – the signals and events to collect

We’ll collect the links they click on and the reading times to record what they actually read. To collect data, we refer to “sending events”. As stated, there are two events to send: clicks and conversions. Both click and conversion events establish a “related book” relationship between two or more books. Conversion also builds up a “frequently read together” relationship between two or more books.

For our purposes, we make the following assumptions and interpretations of clicks and conversions:

We assume that a collection of related books for one student should probably be similar to a collection of related books for other students who follow the same program.
Every next blog a student reads – whether from a link or after a search or a browse – may reflect a reasonable relatedness to the current article. So we send a click event.
To ensure that a next book is relevant, we add reading time to the equation and help Algolia’s modeling by adding an additional weight to strengthen the relationship. We do this weighting by filtering out some events or doubling up the score:
- If a user clicks and does not read the clicked article, we discard the click – that is, we don’t send any click event
- If they read the linked article but not for long, we send a click but not a conversion event
- If they read most or all of a clicked article, we send both the click and conversion event
- If they’ve read most or all of an article, and then come back to the original and continue reading, we send the click and conversion events a second time

These are just suggestions to illustrate the importance of treating each event differently. We can use other signals. Medium allows comments, not only on the whole blog, but on selected parts of the text. If a user highlights a sentence to “like it” or “comment on it”, we can send a click event of all commented articles during a reading session, to strengthen the relationship between these articles. The assumption is that during a single session, a student is reading about one subject, thus all articles will relate back to that subject.

One final variation is to recommend parts of an article or book. We can break down the books into related paragraphs. This provides a particularly useful detail for next reading. We can recommend “related paragraphs to read”.

Now you may agree or disagree with these examples, but the point is, you need to build the model with actions you think best captures the intention of users as they read through the articles of the library catalog. We are trying to avoid randomness by identifying what single-minded students choose to read in succession.

Scenario 2 – Public blog recommendations

Recommending a pedagogical reader journey for students contains a lot of built-in reliability. The context and committed goal to learning a specific subject is shared among the students. But what do we do with an audience who have a variety of interests, with no obvious affinity on their reading choices? While we can still rely on hyperlink clicking, we need to be aware that next articles can be unrelated to a clicked link. So there’s one additional factor that can help us build our model:

The categorization of blogs.

If every blog is categorized accurately, then these categories can be used as follows:

If a book is clicked on and it is in the same set of categories, then score that high when adding it to the model
Whenever any book is added to the model, the algorithm itself will use categories as a way to build its related items recommendations.

Scenario 3 – Readers of a technical blog

As suggested above, it should be “easier” to recommend articles on a single-topic blog such as a technical blog, because readers share the same interest in technology. While this cohesion is not as strong as a student blog, where the recommender system gets the added benefit of closer shared intent (same course materials, same tests), a technical blog does have some similar intent and preferences built in.

Here are some challenges:

There will be many javascript engineers wanting to improve their front ends, so there is some similar intent and preferences, but not completely – the subject of UI is quite wide.
There are back-end engineers not at all interested in UI, but they might read the same articles looking for help understanding how to work with front-end systems.
While all engineers might want to read about the theory and implementation of microservices, front-end engineers might only want to know how to use them for their UI.
An added challenge is profile imbalance, for example, you have 80% JS devs / 20% backend gurus, so a recommendation that works for 80% of users might still be wrong for the latter group. Solution? You need profile-segmented recommendations based on profiles that score high for each segment.

These challenges indicate that it may not be easy to recommend books even for an audience within the same domain of activity but with varying intentions and profiles. But it’s not impossible, as you’ll see. The experimental app that our AI engineer has built investigates the degree of success we can have at this point in time.

The experiment: Recommending books “frequently read together”

In our Frequently Read Together scenario, we can consider:

Each user’s reading list as a single conversion with multiple items
Same-day reads as a single conversion of multiple articles, while a single article per day over 4 days would be seen as 4 separate 1-item conversions.

We’ll use the following signal:

What does a user reads in a short time span (within a clearly-defined session)

Factors not taken into account:

The links within an article that a user clicks on and reads while reading a given article
The category of articles that a user has read. We will not give more or less weight, nor add a signal event, just because 2 or more articles fall into the same category. We don’t do this, not because it’s immaterial, but because we want to focus on how best to match the user’s intent to the content of the blog.
A user’s own reading list, as there’s no reason to believe that what a user wants to read must be related to each other.
Note, however, that these factors, when put together, create a strong indicator of related items; however, our goal here is to recommend articles “frequently read together” not “related articles”. Obviously, related items is fundamental to any recommender system, but we want to limit the factors for now.
We did not take into account where each read came from (link, search, ..). For example, we were fully agnostic of link structure. Another experiment would need to test out the importance of link-clicking for making useful recommendations – even if, on the face of it, it appears significant.

Results of the experiment

Note, the full results and technical details and code will be presented in our next article.

Percentages below are the confidence score: how certain the model is that someone who interacted with X would successfully interact with Y when presented the latter. You can help the user understand this in your UI by labeling it as a “Match score”, “Relevance score”, or “Likelihood” (of the user engaging with the content).

If you read…

Add AI to your search box: we recommend Add WeChat to your search (99.63%), How to Promote with Rules (98.54%), Higher ROI with Personalized Facets (98.52%)

Add WeChat to your search: we recommend Add AI to your search box (99.63%), Higher ROI with Personalized Facets (99.11%), How to Promote with Rules (98.02%)

Adding Search to Recommend: we recommend Scaling your Search Engine (99.72%), Digging into your code (99.13%), Debugging Search (99.02%)

CSS for a Cool Front End: we recommend Debugging Search (95.79%), Digging into your code (95.06%), Scaling your Search Engine (94.79%)

Debugging Search: we recommend Digging into your code (99.83%), Scaling your Search Engine (99.28%), Adding Search to Recommend (99.02%)

Digging into your code: we recommend Debugging Search (99.83%), Scaling your Search Engine (99.39%), Adding Search to Recommend (99.13%)

Higher ROI with Personalized Facets: we recommend Out of the Box Recommend Use Cases (99.18%), Add WeChat to your search (99.11%), Scaling your Search Engine (98.66%)

How to Promote with Rules: we recommend Add AI to your search box (98.54%), Add WeChat to your search (98.02%), Higher ROI with Personalized Facets (95.97%)

Introducing Recommend: we recommend Out of the Box Recommend Use Cases (98.12%), Higher ROI with Personalized Facets (97.76%), Add WeChat to your search (97.42%)

Out of the Box Recommend Use Cases: we recommend Higher ROI with Personalized Facets (99.18%), Digging into your code (99.07%), Scaling your Search Engine (98.97%)

Scaling your Search Engine: we recommend Adding Search to Recommend (99.72%), Digging into your code (99.39%), Debugging Search (99.28%)

Conclusion – Evaluating the recommendations

In the end, how do we know we are right? How do we assess the quality of our recommendations? This gets us into the evaluation phase of any recommender system. We’ll soon publish a complete treatment of how to evaluate a recommender system.

But you can get an idea of the success by mentally assessing the numbers:

Are the recommendations varied, or do you get circular recommendations where item A’s recommended items have the same overall group of recommendations for B, C, and D? (For example, A -> read [B,C,D]; B -> read [C,D,A]; C -> read [B,D,A]; D -> read [A,B,C])
Are the scores varied, or do you always get ~99% recommendations, which would suggest some overfitting, where the system actually stops learning, applying the conclusions of the sample data to all new books.

By asking yourself questions like these, you can already get a good hunch of the quality of your recommendations.

There’s another approach: take small steps and wait and see how users respond to your recommendations. See if readers click and read your recommendations. You can also perform AB Testing or live user testing, or measure conversion, or see what books are not being clicked on. Small steps like these go a long way. Experimentation, practical analysis, and evaluation are central to the success of any recommender system. Happy reading!

ABOUT THE AUTHOR