You’re on a blog surfing for articles to read. The blog displays a list of enticing recommendations to help you choose. Your choices feel unpredictable and driven by, among other things, your current mood – which changes with every new article you read. What are the odds the machine recommends the best articles? What amount of past history (yours and collectively others) could possibly teach a machine how to predict a future subjective choice based on taste? How can a machine anticipate, or participate in, spontaneity?
We can ask the same questions about recommending a movie to watch or a podcast to listen to.
To help us answer these questions, one of our ML/AI engineers used our Recommendation API to develop an app that proposes next-articles-to-read for readers of a technical blog. The scenario goes like this: When an engineer finishes reading an article, the recommender system displays the most relevant articles to read next. The app pulls its recommendations from a dataset of related articles generated by ML models that Algolia Recommend built based on the prior collective activity of all readers of the blog.
The goal here is about raising questions and providing a frame in which to answer them. It’s also about about the simplicity of implementing and evaluating our recommendations. While we often talk about recommendations for ecommerce, our API can handle any use case. As discussed below, the important part is capturing the right signals and “scoring” the relevance of each signal to determine the strength of the relationship between two or more articles.
Combining our Recommendations and Clicks & Conversions Insights APIs, we were able to dive deep into answering questions like the following:
While the experiment did not answer all of these questions, nor succeed in creating an app that can see into the future, we do believe it confirmed that one can, on both a conceptual and concrete level, build a content/media-based website that provides relevant and useful recommendations to its readers.
Recommending books is tricky. It takes a very intuitive and adept bookseller or librarian. Normally, they start with a few questions about intent – What genre of book or author are you looking for? Is it fiction or non-fiction you seek? What subject? Then they move to preferences – questions of taste, style, era, genre. The best advisors listen closely while consulting their inner “database” of experience and knowledge. They get a sense of the reader and how they frame their query. With all this, the personal advisor is ready to recommend the “right” or “best” books. Will their recommendations be perfect? Often, yes. Or at least inspiring, insightful. You can never replace the human touch… Or can you?
How does a recommender system compete with this personal touch? For example, a person has just read Moby Dick and wants to read something else. What’s the next book?
Who knows? There are too many possibilities. It seems random or a shot in the dark to recommend anything. Sometimes even the reader doesn’t know… And yet, that’s where it gets interesting:
When recommending books, booksellers and a good recommender system are not solely concerned about suggesting books, they want to inspire us to make our own best choice about which book to read next.
Enough said of the challenges. Now how did we face those challenges?
Let’s start by comparing three different use cases:
We examine each scenario, but only experiment with the second. We’ll use the first and last to lay down the foundation for the third – a technical blog. Each scenario gives a recommender system more or less context. So we’ll see how important context is in avoiding randomness. But first, let’s look at the heuristic of scoring that can be used to help build the models.
To replace the verbal chat between a reader and a bookseller, an online system must leverage user signals that can communicate intent and preference. This is done with clicks (what users click on before, during, and after they read) and conversions (what users read or how long they read). For example, if many users read articles X and Y in the same reading session, we can conclude two things:
Another example is when a user clicks on a link inside an article, reads the linked article, and then returns to continue reading the first article. This behavior shows intent that these articles are related.
These conclusions about intent are heuristics. Let’s go further, let’s give certain actions more weight: If we believe that multiple clicks on links within the same article, with long reading times for each click, indicates a strong intent of the reader to consider all these articles as related, then we can be confident in giving this set of actions more weight than when someone surfs with shorter reading times. This is another heuristic – to see what behaviors show strong or weak intent, and to weigh or score them accordingly.
This is how you can work with our Algolia Recommend product. Algolia customers can consider several factors before sending click and conversions to our Recommend product. For example, they can add or remove the strength of a signal by either not sending the signal, or sending it once or twice depending on its strength.
One danger, however, with heuristics is that if the underlying assumption about the signal is false, then doubling the score of an already incorrect interpretation doubles the mistake. This is the gamble.
Let’s go back to two of the above contexts:
Students offer a clearly defined and delimited universe that gives a recommender system a good head start. Students have a narrow set of choices. Their intentions are clear and lead to a similar goal as other readers: to learn about a particular subject.
A library’s online catalog can display “Recommended Next Books” and “Books Less likely to help” with confidence. They can make it especially accurate by embedding numerous related-article hyperlinks within the texts of each article. (Note, these hyperlinks can be refreshed over time using the recommender system itself.)
We’ll collect the links they click on and the reading times to record what they actually read. To collect data, we refer to “sending events”. As stated, there are two events to send: clicks and conversions. Both click and conversion events establish a “related book” relationship between two or more books. Conversion also builds up a “frequently read together” relationship between two or more books.
For our purposes, we make the following assumptions and interpretations of clicks and conversions:
These are just suggestions to illustrate the importance of treating each event differently. We can use other signals. Medium allows comments, not only on the whole blog, but on selected parts of the text. If a user highlights a sentence to “like it” or “comment on it”, we can send a click event of all commented articles during a reading session, to strengthen the relationship between these articles. The assumption is that during a single session, a student is reading about one subject, thus all articles will relate back to that subject.
One final variation is to recommend parts of an article or book. We can break down the books into related paragraphs. This provides a particularly useful detail for next reading. We can recommend “related paragraphs to read”.
Now you may agree or disagree with these examples, but the point is, you need to build the model with actions you think best captures the intention of users as they read through the articles of the library catalog. We are trying to avoid randomness by identifying what single-minded students choose to read in succession.
Recommending a pedagogical reader journey for students contains a lot of built-in reliability. The context and committed goal to learning a specific subject is shared among the students. But what do we do with an audience who have a variety of interests, with no obvious affinity on their reading choices? While we can still rely on hyperlink clicking, we need to be aware that next articles can be unrelated to a clicked link. So there’s one additional factor that can help us build our model:
If every blog is categorized accurately, then these categories can be used as follows:
As suggested above, it should be “easier” to recommend articles on a single-topic blog such as a technical blog, because readers share the same interest in technology. While this cohesion is not as strong as a student blog, where the recommender system gets the added benefit of closer shared intent (same course materials, same tests), a technical blog does have some similar intent and preferences built in.
Here are some challenges:
These challenges indicate that it may not be easy to recommend books even for an audience within the same domain of activity but with varying intentions and profiles. But it’s not impossible, as you’ll see. The experimental app that our AI engineer has built investigates the degree of success we can have at this point in time.
In our Frequently Read Together scenario, we can consider:
We’ll use the following signal:
Factors not taken into account:
Note, the full results and technical details and code will be presented in our next article.
Percentages below are the confidence score: how certain the model is that someone who interacted with X would successfully interact with Y when presented the latter. You can help the user understand this in your UI by labeling it as a “Match score”, “Relevance score”, or “Likelihood” (of the user engaging with the content).
If you read… |
|
|
|
|
|
|
|
|
|
|
|
In the end, how do we know we are right? How do we assess the quality of our recommendations? This gets us into the evaluation phase of any recommender system. We’ll soon publish a complete treatment of how to evaluate a recommender system.
But you can get an idea of the success by mentally assessing the numbers:
By asking yourself questions like these, you can already get a good hunch of the quality of your recommendations.
There’s another approach: take small steps and wait and see how users respond to your recommendations. See if readers click and read your recommendations. You can also perform AB Testing or live user testing, or measure conversion, or see what books are not being clicked on. Small steps like these go a long way. Experimentation, practical analysis, and evaluation are central to the success of any recommender system. Happy reading!
Peter Villani
Sr. Tech & Business WriterPowered by Algolia AI Recommendations