AI at scale: Managing ML models over time & across use cases

Back to all blogs

Just a few years ago it would have required considerable resources to build a new AI service from scratch. Of course, that’s all changed. Yet, this is just a very small first step: the actual challenge of running AI at scale is sustaining quality over time and variance.

Managing the lifecycle of ML models over time and across use cases proves to be essential to the long term success of investments in AI. For specific tasks such as translating languages or answering questions, minimal knowledge of Python is all it takes to interact with powerful pre-trained ML models, easily found on repositories such as Hugging Face.

Integrating such a model at the heart of an API, is also relatively easy. Running AI-powered services in production does not differ from running “conventional” services; it might be more CPU intensive than typical CRUD applications, still, serving a large amount of requests with acceptable latency similarly boils down to how many machines to use – hence money.

However, while it may be easy to get started, it’s much harder to maintain, optimize, and scale AI over time. Managing the lifecycle of machine learning models over time and across use cases is essential for long-term success.

The challenges of AI over time

There are scores of new AI models — each more capable than the next — with more hidden layers, more parameters, and different architectures. Game-changing ML models appear regularly, and adjusting their architecture is trivial; in practice, new ML models appear every second. Not all of them are efficient or even relevant to every business use case, but some can significantly improve results. How can you know if a new model is better than a previous one? Deploying ML models and comparing their performance is crucial.

As an additional complication, the performance of a given ML model is known to change over time: their predictive ability or classification power decay. The reasons for this decay, known as concept drift, are beyond the scope of this article. It can be conceptualized as a consequence of “global context” changes: new habits appearing, usage of words evolving, seasons changing, people’s preoccupations shifting. To adapt to that, existing ML models must be monitored over time and manually or continuously retrained before being redeployed and compared.

Besides, note that these considerations are true for any single “intent” and there are many of such intents in an application. For example, in the world of Search:

Properly trained, some ML models like Retina Net or YOLO, can label items of interest in images – therefore enabling textual search over a set of images;
Others, like BART for NLI, can measure the probability for a text to relate to specific labels – therefore enabling content categorization;

Last, business key performance indicators are far from unique, and their importance varies depending on the concrete use case. Continuing the example of search:

For some businesses, the conversion rate is the most important metric to increase;
For other businesses, the generated revenue is the one to optimize

Running AI at scale is accepting all these variables and navigating a multi-dimensional landscape.

Operating AI at scale at Algolia

At Algolia, we handle all of this complexity on behalf of our customers, so that they can focus on their core business and get meaningful outcomes. Each customer is unique: their audience, their content, their preferred business KPIs… everything varies from one customer to another. Running AI at scale means supporting this variability while continuing to introduce new ML models or refining existing models.

We have also been developing proprietary models for years now to solve precise problems such as search personalization, query understanding and matching, and ranking. We also augment our pipelines with existing pre-trained models – for example, we started our semantic search efforts with the Universal Sentence Encoder suite. Today, Algolia NeuralSearch uses a combination of several ML models to solve very specific search intent for very different use cases, and we will continue to introduce new models to increase the power of our search.

In a way similar to how versions are tracked in production, we keep extensive track of the ML models being used over time. This means that we can understand which instances have which combination of models, therefore which customers are using which versions. As we leverage these models to build dedicated data structures, this tracking is also key to trigger the updating of these derived data (e.g. indices).

Perhaps the most important aspect to improving ML models over time is tracking how models are performing to help customers achieve their business KPIs. Algolia customers configure their search and recommendation pipelines with events — clicks, conversions, purchases, ratings, add-to-cart, and so forth — and events are key to the success of an implementation.

When deploying new ML models, we first monitor their impact on these KPIs for a small but significant part of customers’ traffic, and for a significant amount of time. Depending on the customer, it may take a couple of weeks to confirm that a particular model is improving the relevance of their search experience.

What does it mean to monitor ML? Developers are familiar with how conventional software is monitored to find errors. Input and output are generally pretty clear as deterministic, and many errors can be detected and captured as test cases. On the other hand, ML models are non-deterministic by nature: they are expected to answer in ways that cannot be predicted. Identifying incorrect behavior and alerting accordingly is an extremely complex problem, which only AI experts knowing their models can solve appropriately.

NeuralSearch with Algolia

With Algolia NeuralSearch, customers can benefit from state-of-the-art AI based search, while benefiting from Algolia’s renowned performance, reliability and quality. All this complexity – from the selection of ML models, to their deployment, monitoring and management over time – is handled by Algolia.

Learn more about the tradeoffs of buying vs building AI search from scratch, or sign up today to see how NeuralSearch can work for your use case.

AI at scale: Managing ML models over time & across use cases

The challenges of AI over time

Operating AI at scale at Algolia

NeuralSearch with Algolia

Recommended Content

Get the AI search that shows users what they need

Agentic intelligence layer powering commerce discovery

A leader for the third consecutive year

Increased Operating Profit and Improved Efficiency

Named a leader in knowledge discovery

Top scores across every B2B category