How are you thinking about personalization?
That’s how I started most of my customer conversations over the past months. I’m still doing it because it’s fascinating to me to understand the various perspectives on personalization customers have. From accomplishing personalization by adding {first name} to an email subject, to manually bucketing users based on demographics and exposing them to bespoke content or adding a “recently viewed” widget on a product detail page of an e-commerce site, the plethora of practices that fall under the broad topic of personalization is staggering.
And yes, recommendation engines and even more sophisticated AI-based personalization strategies have been mentioned as a way to solve for scalability. But by no means these technologies are something new. Gartner released its first Magic Quadrant report on Personalization Engines back in 2018 and even long before that, in 2003, a paper called “Amazon.com Recommendations: Item-to-Item Collaborative Filtering”, by then Amazon researchers Greg Linden, Brent Smith, and Jeremy York was published.
Yet, personalization remains the wholly grail of customer experiences, creating frustrations as well as fueling myths about a button, a checkbox or a toggle hidden somewhere, deep in the realm of personalization engines out there, that would auto-magically cause an AI overlord to devise ideal experiences for all. This skewed perspective is leading the majority of marketers to abandon their personalization efforts, as Gartner uncovered back in 2019.
Let’s get pragmatic about it.
Personalization is NOT a tick-box exercise, it’s a complex undertaking, a Venn diagram representing three questions that need to be answered:
If that’s the case it becomes clear that personalization is no longer synonymous with marketing, as it was considered in the early days of personalization. Instead, it requires a product mindset, a close collaboration between the business team, product managers and the engineering team in solving the problems associated with personalization.
After numerous conversations and feedback sessions with customers, especially in the e-commerce industry, I managed to dissect the broad & complex topic of personalization into 7 subproblems:
I propose we dive deeper into analyzing each of the 7 dysfunctions of personalization engines and their potential solutions. By the end of the article you’ll be better equipped to understand and apply the right strategy for your own “project personalization”. By the way, if you want to watch a video on the same topic, I recommend this presentation at DevCon as it represents the basis for this article.
83% of end-users expect personalization within moments and hours, but at the same time, up to 80% of consumers are sensitive to companies’ security and privacy practices regarding their online data.
At first, this might seem like a paradox: personalizing users’ experiences while protecting their data privacy? If this is indeed the case, how can we solve it?
One proposal is to make the personalization engine privacy-aware by allowing the user to opt-in and out and to decide the conditions under which their user data profiles can be used. Users can “activate”, even partially, the data that is necessary for the level of personalization they deem comfortable and appropriate for the stage they’re at. An end-user might have access to a privacy settings page that looks something like this:
Notice the informational, navigational and transactional profiles. And behind the scenes we might have the following JSON representation of those user profiles:
[{ "user": "user-123", "informational-profile": { "properties": { "raw": { "device": "mobile", "sessionCount": 12, "timeOnSite": "02:03:10", "browser": "chrome", "pageviews": 32, "avgSessionDuration": 102, "lastVisit": "2022-09-11T10:12:37Z" } } } }, { "user": "user-123", "navigational-profile": { "products": { "value": [ { "name": "Jirgi Half-Zip T-Shirt", "objectID": "D05927-8161-111", "url": "men/t-shirts/d05927-8161-111" }, { "name": "Boys T-Shirt", "objectID": "D12461-8136-211", "url": "boys/t-shirts/d12461-8136-21" }, { "name": "Men shorts", "objectID": "D12345-5678-910", "url": "men/shorts/d12345-5678-910" } ], "lastUpdatedAt": "2021-07-11T07:07:00Z" } } }, { "user": "user-123", "commercial-profile": { "orders": { "value": [ { "total": 159, "products": { "value": [ { "name": "T-Shirts", "objectID": "D05927-8161-111", "size": "L", "quantity": 1, "price": 99.00 }, { "name": "Hats", "objectID": "D12461-8136-211", "size": "M", "quantity": 1, "price": 15.00 }, { "name": "Shorts", "objectID": "D12345-5678-910", "url": "men/shorts/d12345-5678-910", "size": "L", "quantity": 1, "price": 45.00 } ] } } ], "lastUpdatedAt": "2021-07-12T10:03:37Z" } } }]
Ideally, you’d want your personalization engine to cover: (1) real-time (online) predictions and (2) real-time (continual) learning. As such, there may be 3 levels in terms of real-time readiness:
The impact of real-time on the quality of predictions varies with the user type:
Data is one of the most underutilized assets that companies possess. It comes in all sizes and shapes, structure or unstructured. But in the context of a personalization system we’re mostly referring to user-centric data: behavioral, events, clickstreams – that’s what’s needed to build a centralized user profile that can be then leveraged by other products and services to personalize user experiences.
Depending on where we fit in the Gartner AI maturity model, we have to take into consideration a few critical aspects:
There seems to be the expectation that new users should have the same (personalized) experience as returning users, when in fact new users have little to no data. Obviously this is not possible and the classic solution is to have a hybrid approach: 1) a content centric approach for a user landing for the first time on your website and then 2) change it to a user centric approach once you know more about the user.
To clarify what we mean by “user profile” we need to understand the different types of user identifiers:
A performant personalization engine should consider “session unification” and handle or, at least support user identity consistency:
Data exists in multiple analytics platforms (Google Analytics 4.0/360, BigQuery), CRMs, CDPs (Segment), etc. That begs the question: what is the single source of truth? And if the data is complementary, how do you stitch it together?
We’ve seen that the first step is to ensure user identity reconciliation. The second step is to build the feature store for the personalization system.
The ways features are maintained and served can differ significantly across projects and teams. This introduces infrastructure complexity and often results in duplication of work. Some of the challenges faced by distributed organizations include:
To address these issues, a feature store acts as a central vault for storing documented, curated, and access-controlled features within an organization.
Essentially, a feature store allows data engineers to insert features. In turn, data analysts and machine-learning engineers use an API to get feature values they deem relevant.
Machine-learning engineers spend 80% of their time doing feature engineering because it’s a time-consuming and difficult process. But they are doing it because, as it has been shown in a paper from 2014 “Practical Lessons from Predicting Clicks on Ads at Facebook”, having the right features is the most important thing in developing their ML models.
As a conclusion, when it comes to data and machine learning, there’s one rule to remember: garbage in, garbage out. We cannot brute-force our way out by throwing data at a personalization system and hope that will produce good results. We need to clean the data first and then use it.
User intent is defined as the purpose of a user’s series of actions. Marketers have been traditionally working with a standard set of intents, mainly inspired by Google’s search algorithm: navigational, informational and transactional. In reality the user intent is more complex than that and it varies from session to session, website to website and industry to industry.
When most marketers hear about “intent-based personalization” they think about recommendations. But a performant personalization system shouldn’t limit itself to just recommending items.
For ecommerce journeys, we might be looking at the following types of intents: (1) goal-oriented intents; (2) affinity-oriented intents; (3) metrics-oriented intents. And it’s important to note that we’re expecting users to manifest a combination of intents, not just a dominant one.
User intents can be represented as a graph: we can imagine that each user (U) is linked to their sessions (S) and during each session the user interacts with products (P) — each with their own attributes (A).
There are certain events that users can do in their session that are not necessarily linked to a specific item: signing up, churning, or browsing. While others are linked to the items and cart: abandoning the cart, adding to the cart, and checkout.
Users can also search, in which case there are common queries linked to items and categories of items and that’s where the complexity of the graph increases even more.
Being able to accurately predict user intents is critical for personalization systems, if you want to go beyond product recommendations. In a REST API format, you’d expect something like /1/users/{identifier}/fetch to respond with:
{ "user": "user_123", "intents": [ { "intent-type": "goals", "value": [ { "name": "product_view", "probability": 0.56 }, { "name": "add_to_cart", "probability": 0.32 }, { "name": "transaction", "probability": 0.12 }, { "name": "cart_abandonment", "probability": 0.42 } ] }, { "intent-type": "metrics", "value": [ { "name": "next_order_value", "value": 100 }, { "name": "session_duration", "value": 125 }, { "name": "cognitive_load", "value": 0.12 } ] }, { "intent-type": "affinities", "value": [ { "name": "color", "value": "red", "probability": 0.56 }, { "name": "brand", "value": "adidas", "probability": 0.55 }, { "name": "category", "value": "shoes", "probability": 0.67 } ] } ] }
In practice, you want to be able to explore the user intent graph and easily extract users based on any given combination of intents:
intents.cart_abandonment.probability: 0.5 TO 0.9 AND intents.next_order_value.value >= 50 AND intents.affinities.brand.value = “adidas” AND intents.affinities.brand.probability > 0.5
This is where intent-based segmentation comes into play and there are 4 levels that we should considering for our personalization system:
This brings us to the next dysfunction: a personalization system that is “not composable”. If we admit that personalizing the journey of an end-user implies more than just displaying a recommendation widget on a product detail page, then, as developers, we need a way to orchestrate personalized experiences, to create intelligent triggers based on predicted intents.
For example, if a user is interested in items with the following characteristics: color: red (56% probability), brand: Adidas (67% probability) we should have the means to do something like this:
For that, we need a composable approach and the API-first architecture is the developer-friendly way to accomplish this. At the minimum you’d want to have access to:
This is where you’d be able to configure the machine learning models that are part of the personalization system, whether we’re talking about: frequently bought together, related products or intent predictions.
[ { "name": "Related Products", "type": "related_products", "compatibleSources": ["bigquery"], "dataRequirements" : { "minUsers": 10000, "minDays": 90 }, "frequency" : “weekly” }, { "name": "Affinities", "type": "affinities", "compatibleSources": ["bigquery"], "dataRequirements" : { "minUsers": 50000, "minDays": 30 }, "frequency" : “daily” }, … ]
This API would allow you to request raw and/or predicted properties for an authenticated user (userID) or an anonymous user (cookieID, sessionID).
{ "user": "user_1", "properties": { "raw": { "lastUpdatedAt": "2021-07-11T10:12:37Z", "device": "mobile", "sessionCount": 12, "timeOnSite": "02:03:10", "browser": "chrome", "pageviews": 32, "avgSessionDuration": 102, "lastVisit": "2021-07-11T10:12:37Z", ... }, "predicted": { ... } }
Segments are used to group and filter users based on raw and predicted values.
[ { "segmentID": "segment_1", "name": "Mobile users that will complete a purchase", "conditions": "predictions.funnel_stage.value:transaction AND (predictions.funnel_stage.probability: 0.5 TO 0.9) AND raw.device = 'mobile'", "type": "computed" }, { "segmentID": "segment_3", "name": "Users that are interested in red Adidas shoes", "conditions": "predictions.affinities.color.value = 'red' AND predictions.affinities.brand.value = 'adidas' AND predictions.affinities.category.value = 'shoes' AND predictions.affinities.color.probability > 0.5 AND predictions.affinities.brand.probability > 0.5 AND predictions.affinities.category.probability > 0.5", "type": "computed" }, ... ]
A personalization engine needs to be trustworthy in terms of measuring and delivering business results, it needs to be verifiable in all aspects of its impact – able to explain why the inputs that went into building the ML model are important and how they impact the outputs/predictions, both in terms of offline and online metrics.
Here are some of the things you should be expecting from a transparent AI-based personalization engine:
When it comes to measuring the business results (KPI) after deploying a personalization system, we must be careful not to be myopic about it. Let me give you an example:
Click Through Rate | Conversion Rate | Average Order Value | |||
Variant A | Variant B | Variant A | Variant B | Variant A | Variant B |
10% | 20% | 10% | 5% | $50 | $100 |
A lot of companies would evaluate only the CTR and would conclude that Variant B is the winning one, but if you inspect it closer, you see that in fact, in terms of CR, Variant A is the best. Is it? In fact, if you calculate, you realize it’s essentially the same thing.
But, if you take a step further and look at the average order value, Variant B produces most revenues. Not because of better conversion rates, but because users add more items or more expensive items to the cart, increasing the average order value.
In conclusion, a personalization system should be transparent with the performance of machine learning models that sit under the hood as well as the business results once the personalization has been implemented and deployed into production.
Imagine you’re building a system to rank items on a users’ newsfeed. Your goal is to maximize users’ engagement – likelihood of users to click on it. But you soon realize that optimizing for users’ engagement alone can lead to questionable ethical concerns because extreme posts tend to get more engagements and the algorithm will learn to prioritize extreme content. Sounds familiar?
Unfortunately, the ethical implications of personalization systems is an afterthought and companies pay attention only when it hurts their bottom line. Mind the ethical gap from the scoping stage of your “project personalization” and don’t let it surprise you later on.
The one question that usually helps me identify ethical blindspots is: “If we can build it, should we?” In other words, AI is like a knife that can both be used in a surgery and in a fight. And I believe developers have a responsibility in leading the way AI is being applied.
In light of all that we discussed, it’s clear that building a performant personalization engine is not a trivial task. And there’s a good reason for that: personalization is not a tick box exercise. It’s a complex undertaking due to the uniqueness of every user, the need to respect their privacy, and providing a personalized experience at the same time.
Let’s agree that, when it comes to personalization, building for the end-user is our primary goal.
We can also acknowledge that this process requires multiple iterations — a process that we, as developers, seldom navigate by looking at the data or even consider the ethical implications of our code. Usually, there’s somebody else, be it a product manager, marketing analyst or even a data scientist, scrutinizing & summarizing the data and translating that into a feature list needed for the next product release.
There lies the gap between developers (builders) and end-users, leading to the development of suboptimal products.
What if developers could be better equipped to identify firsthand how users interact with the product they’re building. What if data could be available to developers directly in the user-facing product components? And what if developers could build these components to automatically adapt to user behaviors based on their intents, ultimately providing a better user experience?
It’s clear that building a performant personalization engine is not a trivial task. That’s why for the past year we’ve been working on a new product that fixes the seven dysfunctions we’ve talked about in this article. If you’re interested in getting early access, here’s where you can sign up for the waiting list: https://alg.li/fixperso
Ciprian Borodescu
AI Product Manager | On a mission to help people succeed through the use of AIPowered by Algolia AI Recommendations
Ciprian Borodescu
AI Product Manager | On a mission to help people succeed through the use of AIEunice Lee
Matthew Foyle
Solutions Engineer @ AlgoliaCiprian Borodescu
AI Product Manager | On a mission to help people succeed through the use of AI