The 7 Dysfunctions of Personalization Engines

Estimated time to read: 8 minutes

How are you thinking about personalization?

That’s how I started most of my customer conversations over the past months. I’m still doing it because it’s fascinating to me to understand the various perspectives on personalization customers have. From accomplishing personalization by adding {first name} to an email subject, to manually bucketing users based on demographics and exposing them to bespoke content or adding a “recently viewed” widget on a product detail page of an e-commerce site, the plethora of practices that fall under the broad topic of personalization is staggering.

And yes, recommendation engines and even more sophisticated AI-based personalization strategies have been mentioned as a way to solve for scalability. But by no means these technologies are something new. Gartner released its first Magic Quadrant report on Personalization Engines back in 2018 and even long before that, in 2003, a paper called “Amazon.com Recommendations: Item-to-Item Collaborative Filtering”, by then Amazon researchers Greg Linden, Brent Smith, and Jeremy York was published.

Yet, personalization remains the wholly grail of customer experiences, creating frustrations as well as fueling myths about a button, a checkbox or a toggle hidden somewhere, deep in the realm of personalization engines out there, that would auto-magically cause an AI overlord to devise ideal experiences for all. This skewed perspective is leading the majority of marketers to abandon their personalization efforts, as Gartner uncovered back in 2019.

Let’s get pragmatic about it.

Personalization is NOT a tick-box exercise, it’s a complex undertaking, a Venn diagram representing three questions that need to be answered:

WHY are we doing personalization?
WHAT are we building?
HOW are we building it?

If that’s the case it becomes clear that personalization is no longer synonymous with marketing, as it was considered in the early days of personalization. Instead, it requires a product mindset, a close collaboration between the business team, product managers and the engineering team in solving the problems associated with personalization.

After numerous conversations and feedback sessions with customers, especially in the e-commerce industry, I managed to dissect the broad & complex topic of personalization into 7 subproblems:

I propose we dive deeper into analyzing each of the 7 dysfunctions of personalization engines and their potential solutions. By the end of the article you’ll be better equipped to understand and apply the right strategy for your own “project personalization”. By the way, if you want to watch a video on the same topic, I recommend this presentation at DevCon as it represents the basis for this article.

1. Not respecting user privacy or the privacy-friendly flaw

83% of end-users expect personalization within moments and hours, but at the same time, up to 80% of consumers are sensitive to companies’ security and privacy practices regarding their online data.

At first, this might seem like a paradox: personalizing users’ experiences while protecting their data privacy? If this is indeed the case, how can we solve it?

One proposal is to make the personalization engine privacy-aware by allowing the user to opt-in and out and to decide the conditions under which their user data profiles can be used. Users can “activate”, even partially, the data that is necessary for the level of personalization they deem comfortable and appropriate for the stage they’re at. An end-user might have access to a privacy settings page that looks something like this:

Notice the informational, navigational and transactional profiles. And behind the scenes we might have the following JSON representation of those user profiles:

[{
  "user": "user-123",
  "informational-profile": {
    "properties": {
      "raw": {
        "device": "mobile",
        "sessionCount": 12,
        "timeOnSite": "02:03:10",
        "browser": "chrome",
        "pageviews": 32,
        "avgSessionDuration": 102,
        "lastVisit": "2022-09-11T10:12:37Z"
      }
    }
  }
},
{
  "user": "user-123",
  "navigational-profile": {
    "products": {
      "value": [
        {
          "name": "Jirgi Half-Zip T-Shirt",
          "objectID": "D05927-8161-111",
          "url": "men/t-shirts/d05927-8161-111"
        },
        {
          "name": "Boys T-Shirt",
          "objectID": "D12461-8136-211",
          "url": "boys/t-shirts/d12461-8136-21"
        },
        {
          "name": "Men shorts",
          "objectID": "D12345-5678-910",
          "url": "men/shorts/d12345-5678-910"
        }
      ],
      "lastUpdatedAt": "2021-07-11T07:07:00Z"
    }
  }
},
{
  "user": "user-123",
  "commercial-profile": {
    "orders": {
      "value": [
        {
          "total": 159,
          "products": {
            "value": [
              {
                "name": "T-Shirts",
                "objectID": "D05927-8161-111",
                "size": "L",
                "quantity": 1,
                "price": 99.00
              },
              {
                "name": "Hats",
                "objectID": "D12461-8136-211",
                "size": "M",
                "quantity": 1,
                "price": 15.00
              },
              {
                "name": "Shorts",
                "objectID": "D12345-5678-910",
                "url": "men/shorts/d12345-5678-910",
                "size": "L",
                "quantity": 1,
                "price": 45.00
              }
            ]          
          }
        }
      ],
      "lastUpdatedAt": "2021-07-12T10:03:37Z"
    }
  }
}]

2. No real-time support or the cold start problem

Ideally, you’d want your personalization engine to cover: (1) real-time (online) predictions and (2) real-time (continual) learning. As such, there may be 3 levels in terms of real-time readiness:

Level-0: Batch processing. Predictions are generated once per day (or even more frequently) and saved into the database. When an API request is being made, the pre-generated & saved predictions will be served.
Level-1: Online/Real-time inference capability. A mechanism of generating & saving (even caching predictions) into the database when the API request is being made.
Level-2: Real-time learning. Besides inference being real-time, ML models are also updated in a continuous manner.

The impact of real-time on the quality of predictions varies with the user type:

For first-time visitors that generated a single session, session-based predictions are essential. The real-time personalization engine will generate better predictions for users that have a rich session in terms of duration and events that are being sent.
For repeat visitors or non-authenticated users that generated multiple sessions, the real-time personalization engine will be able to more accurately predict their preferences.
And finally, authenticated users not only allow us to reconcile user identity across multiple devices but also enable the personalization engine to perform at its best when surfacing user intents.

3. Poor data quality or user identity inconsistency

Data is one of the most underutilized assets that companies possess. It comes in all sizes and shapes, structure or unstructured. But in the context of a personalization system we’re mostly referring to user-centric data: behavioral, events, clickstreams – that’s what’s needed to build a centralized user profile that can be then leveraged by other products and services to personalize user experiences.

Depending on where we fit in the Gartner AI maturity model, we have to take into consideration a few critical aspects:

The Cold-Start Problem

There seems to be the expectation that new users should have the same (personalized) experience as returning users, when in fact new users have little to no data. Obviously this is not possible and the classic solution is to have a hybrid approach: 1) a content centric approach for a user landing for the first time on your website and then 2) change it to a user centric approach once you know more about the user.

User Identity Reconciliation

To clarify what we mean by “user profile” we need to understand the different types of user identifiers:

- Session ID = a unique identifier that is created for a single session.
- Device ID = a browser-based or mobile-app-based identifier for a unique, anonymous website or mobile app user. This is the default identifier that Google Analytics uses to distinguish site visitors.
- User ID = a custom-generated ID used for session unification across different devices and it is set when a user authenticates.

A performant personalization engine should consider “session unification” and handle or, at least support user identity consistency:

- Across sessions: sessions from the same user should be tracked under a single device ID if the user is not authenticated or a single user ID if the user is authenticated.
- Across users’ states (authenticated, non-authenticated): sessions created while the user is not authenticated should not be lost if the user authenticates.
- Across devices: sessions from different devices should be merged under a single user ID once the user authenticates.

Combining Different Data Sources

Data exists in multiple analytics platforms (Google Analytics 4.0/360, BigQuery), CRMs, CDPs (Segment), etc. That begs the question: what is the single source of truth? And if the data is complementary, how do you stitch it together?

We’ve seen that the first step is to ensure user identity reconciliation. The second step is to build the feature store for the personalization system.

The ways features are maintained and served can differ significantly across projects and teams. This introduces infrastructure complexity and often results in duplication of work. Some of the challenges faced by distributed organizations include:

- Features are not reused
- Feature definitions vary
- Features take a long time to be computed
- There is inconsistency between training and serving
- Feature decay is unknown

To address these issues, a feature store acts as a central vault for storing documented, curated, and access-controlled features within an organization.

Essentially, a feature store allows data engineers to insert features. In turn, data analysts and machine-learning engineers use an API to get feature values they deem relevant.

Machine-learning engineers spend 80% of their time doing feature engineering because it’s a time-consuming and difficult process. But they are doing it because, as it has been shown in a paper from 2014 “Practical Lessons from Predicting Clicks on Ads at Facebook”, having the right features is the most important thing in developing their ML models.

As a conclusion, when it comes to data and machine learning, there’s one rule to remember: garbage in, garbage out. We cannot brute-force our way out by throwing data at a personalization system and hope that will produce good results. We need to clean the data first and then use it.

4. Poor understanding of user intents

User intent is defined as the purpose of a user’s series of actions. Marketers have been traditionally working with a standard set of intents, mainly inspired by Google’s search algorithm: navigational, informational and transactional. In reality the user intent is more complex than that and it varies from session to session, website to website and industry to industry.

When most marketers hear about “intent-based personalization” they think about recommendations. But a performant personalization system shouldn’t limit itself to just recommending items.

For ecommerce journeys, we might be looking at the following types of intents: (1) goal-oriented intents; (2) affinity-oriented intents; (3) metrics-oriented intents. And it’s important to note that we’re expecting users to manifest a combination of intents, not just a dominant one.

User intents can be represented as a graph: we can imagine that each user (U) is linked to their sessions (S) and during each session the user interacts with products (P) — each with their own attributes (A).

A representation of a user intent graph: user-sessions-products-attributes

There are certain events that users can do in their session that are not necessarily linked to a specific item: signing up, churning, or browsing. While others are linked to the items and cart: abandoning the cart, adding to the cart, and checkout.

Users can also search, in which case there are common queries linked to items and categories of items and that’s where the complexity of the graph increases even more.

A representation of a user intent graph: user-sessions-products-attributes, events and search queries

Being able to accurately predict user intents is critical for personalization systems, if you want to go beyond product recommendations. In a REST API format, you’d expect something like /1/users/{identifier}/fetch to respond with:

{
  "user": "user_123",
  "intents": [
    {
        "intent-type": "goals",
        "value": [
            {
                "name": "product_view",
                "probability": 0.56
            },
            {
                "name": "add_to_cart",
                "probability": 0.32
            },
            {
                "name": "transaction",
                "probability": 0.12
            },
            {
                "name": "cart_abandonment",
                "probability": 0.42
            }
        ]
    },
    {
        "intent-type": "metrics",
        "value": [
            {
                "name": "next_order_value",
                "value": 100
            },
            {
                "name": "session_duration",
                "value": 125
            },
            {
                "name": "cognitive_load",
                "value": 0.12
            }
        ]
    },
    {
        "intent-type": "affinities",
        "value": [
            {
                "name": "color",
                "value": "red",
                "probability": 0.56
            },
            {
                "name": "brand",
                "value": "adidas",
                "probability": 0.55
            },
            {
                "name": "category",
                "value": "shoes",
                "probability": 0.67
            }
        ]    
      }
  ]
}

In practice, you want to be able to explore the user intent graph and easily extract users based on any given combination of intents:

intents.cart_abandonment.probability: 0.5 TO 0.9 
AND intents.next_order_value.value >= 50 
AND intents.affinities.brand.value = “adidas” 
AND intents.affinities.brand.probability > 0.5

This is where intent-based segmentation comes into play and there are 4 levels that we should considering for our personalization system:

Level-0: Segmentation based on raw user attributes (device, age, city, etc.)
Level-1: Segmentation based on predicted values, like we’ve seen above
Level-2: Segmentation based on raw AND predicted values
Level-3: Lookalike segmentation – identify other users that are similar to the ones in the “target” segment

5. Not composable (API-first)

This brings us to the next dysfunction: a personalization system that is “not composable”. If we admit that personalizing the journey of an end-user implies more than just displaying a recommendation widget on a product detail page, then, as developers, we need a way to orchestrate personalized experiences, to create intelligent triggers based on predicted intents.

For example, if a user is interested in items with the following characteristics: color: red (56% probability), brand: Adidas (67% probability) we should have the means to do something like this:

For that, we need a composable approach and the API-first architecture is the developer-friendly way to accomplish this. At the minimum you’d want to have access to:

Models API

This is where you’d be able to configure the machine learning models that are part of the personalization system, whether we’re talking about: frequently bought together, related products or intent predictions.

[
{
  "name": "Related Products",
  "type": "related_products",
  "compatibleSources": ["bigquery"],
  "dataRequirements" : {
     "minUsers": 10000,
     "minDays": 90 
  },
  "frequency" : “weekly”
},
{
  "name": "Affinities",
  "type": "affinities",
  "compatibleSources": ["bigquery"],
  "dataRequirements" : {
     "minUsers": 50000,
     "minDays": 30 
  },
  "frequency" : “daily”
},
…
]

User Profiles API

This API would allow you to request raw and/or predicted properties for an authenticated user (userID) or an anonymous user (cookieID, sessionID).

{
  "user": "user_1",
  "properties": {
    "raw": {
      "lastUpdatedAt": "2021-07-11T10:12:37Z",
      "device": "mobile",
      "sessionCount": 12,
      "timeOnSite": "02:03:10",
      "browser": "chrome",
      "pageviews": 32,
      "avgSessionDuration": 102,
      "lastVisit": "2021-07-11T10:12:37Z",
      ...
    },
   "predicted": {
    ...
   }
}

Segments API

Segments are used to group and filter users based on raw and predicted values.

[
{
  "segmentID": "segment_1",
  "name": "Mobile users that will complete a purchase",
  "conditions": "predictions.funnel_stage.value:transaction AND (predictions.funnel_stage.probability: 0.5 TO 0.9) AND raw.device = 'mobile'",
    
  "type": "computed"
},
{
  "segmentID": "segment_3",
  "name": "Users that are interested in red Adidas shoes",
  "conditions": "predictions.affinities.color.value = 'red' AND predictions.affinities.brand.value = 'adidas' AND predictions.affinities.category.value = 'shoes' AND predictions.affinities.color.probability > 0.5 AND predictions.affinities.brand.probability > 0.5 AND predictions.affinities.category.probability > 0.5",
  "type": "computed"
},
...
]

6. Lack of transparency in terms of performance metrics

A personalization engine needs to be trustworthy in terms of measuring and delivering business results, it needs to be verifiable in all aspects of its impact – able to explain why the inputs that went into building the ML model are important and how they impact the outputs/predictions, both in terms of offline and online metrics.

Here are some of the things you should be expecting from a transparent AI-based personalization engine:

How much each feature contribute to a model’s single prediction (Correlation Matrix)
Accuracy, Precision, Recall, F1 Score (Confusion Matrix)

When it comes to measuring the business results (KPI) after deploying a personalization system, we must be careful not to be myopic about it. Let me give you an example:

Click Through Rate		Conversion Rate		Average Order Value
Variant A	Variant B	Variant A	Variant B	Variant A	Variant B
10%	20%	10%	5%	$50	$100

A lot of companies would evaluate only the CTR and would conclude that Variant B is the winning one, but if you inspect it closer, you see that in fact, in terms of CR, Variant A is the best. Is it? In fact, if you calculate, you realize it’s essentially the same thing.

But, if you take a step further and look at the average order value, Variant B produces most revenues. Not because of better conversion rates, but because users add more items or more expensive items to the cart, increasing the average order value.

In conclusion, a personalization system should be transparent with the performance of machine learning models that sit under the hood as well as the business results once the personalization has been implemented and deployed into production.

7. AI ethics immaturity

Imagine you’re building a system to rank items on a users’ newsfeed. Your goal is to maximize users’ engagement – likelihood of users to click on it. But you soon realize that optimizing for users’ engagement alone can lead to questionable ethical concerns because extreme posts tend to get more engagements and the algorithm will learn to prioritize extreme content. Sounds familiar?

Unfortunately, the ethical implications of personalization systems is an afterthought and companies pay attention only when it hurts their bottom line. Mind the ethical gap from the scoping stage of your “project personalization” and don’t let it surprise you later on.

The one question that usually helps me identify ethical blindspots is: “If we can build it, should we?” In other words, AI is like a knife that can both be used in a surgery and in a fight. And I believe developers have a responsibility in leading the way AI is being applied.

In light of all that we discussed, it’s clear that building a performant personalization engine is not a trivial task. And there’s a good reason for that: personalization is not a tick box exercise. It’s a complex undertaking due to the uniqueness of every user, the need to respect their privacy, and providing a personalized experience at the same time.

Let’s agree that, when it comes to personalization, building for the end-user is our primary goal.

We can also acknowledge that this process requires multiple iterations — a process that we, as developers, seldom navigate by looking at the data or even consider the ethical implications of our code. Usually, there’s somebody else, be it a product manager, marketing analyst or even a data scientist, scrutinizing & summarizing the data and translating that into a feature list needed for the next product release.

There lies the gap between developers (builders) and end-users, leading to the development of suboptimal products.

What if developers could be better equipped to identify firsthand how users interact with the product they’re building. What if data could be available to developers directly in the user-facing product components? And what if developers could build these components to automatically adapt to user behaviors based on their intents, ultimately providing a better user experience?

It’s clear that building a performant personalization engine is not a trivial task. That’s why for the past year we’ve been working on a new product that fixes the seven dysfunctions we’ve talked about in this article. If you’re interested in getting early access, here’s where you can sign up for the waiting list: https://alg.li/fixperso

The 7 Dysfunctions of Personalization Engines

1. Not respecting user privacy or the privacy-friendly flaw

2. No real-time support or the cold start problem

3. Poor data quality or user identity inconsistency

4. Poor understanding of user intents

5. Not composable (API-first)

6. Lack of transparency in terms of performance metrics

7. AI ethics immaturity

Recommended

Get the AI search that shows users what they need

Agentic intelligence layer powering commerce discovery

A leader for the third consecutive year

Increased Operating Profit and Improved Efficiency

Named a leader in knowledge discovery

Top scores across every B2B category