Multi-cluster management (MCM)

Multi-cluster management is deprecated and will be sunset. If you have issues with your Algolia infrastructure due to large volumes of data, contact the Algolia support team.

What is multi-cluster management?

If all of your data fits onto one cluster, you don’t need MCM. When your data becomes too large for one cluster, Multi-Cluster Management (MCM) lets you split your data so that:

Your data spans more than one cluster,
Algolia can perform a full query on only one cluster.

Consider an email system, in which users search their own (massive) email history. This is a case where one cluster will become insufficient to store all those emails. Before MCM, you could have added more clusters, but, as discussed below, managing multiple clusters without MCM would be difficult. MCM helps you manage the redistribution of the emails, so you can put some users on cluster 1, others on cluster 2, and still others across additional clusters. This scales as the number of email users grows. it’s also transparent: without MCM, you would need to manage the multiple clusters yourself. This would include keeping track of the clusters and their different application IDs, and which users are on which clusters. And you would also be moving data from one machine to another, to balance the load or when adding new clusters. Using Algolia’s MCM all you need to do is tell the MCM API on which unique attribute you wish to split your data (here, the user’s email), and then make sure that all indexing and search operations contain the appropriate user attribute (all must include the email). The rest is managed with the MultiCluster API: cluster management, index mapping, search mapping, and load balancing.

Other use-cases

Managing user emails is only one use case.

SaaS providers like Salesforce or Dropbox can use MCM to manage their data. This is similar to the email system, but with a unique customer id - each cluster gets a different set of customers.
Music streaming services like Deezer and Spotify. A single cluster can store the full collection of music (public playlists and user playlists). When user playlists no longer fit on one cluster, you can move to multiple clusters. Each cluster still contains the full collection of music and public playlists, while MCM can [distribute private playlists across clusters]. This lets a company start with one cluster and add more clusters as its customer base grows. The only requirement is that you can split the data into discrete chunks (for example, by UserID or customerID).

Essentially, MCM comes into play whenever a set of data can be logically split into smaller chunks, such that a full search can be performed on only one chunk at a time.

One search, one cluster

MCM does not enable searching across clusters, nor does it merge results from different clusters. The distribution is based on a split where a user’s entire dataset can be placed on a single cluster, so that only one cluster is actually needed to perform the complete search. MCM does not, in other words, aggregate results from different clusters. This works for an email system, because users have exclusive access to their own data. In contrast, consider an online library that enables full-text book searching. You couldn’t spread the books on different servers and expect complete results from a single-cluster search. You would need to search all of the clusters and then aggregate the results.

Multiple clusters in more detail

Load balancing

One of the natural consequences of splitting up your data is that the split will not always stay balanced - some users will have more data than others. MCM simplifies load balancing. You will base your initial cluster distribution on current usage but, over time, as usage grows, the initial balance will get undone. With MCM, you can move your user data around to sustain a balanced state.

More about your data

Just to reiterate how MCM works:

You’ll first want to slice up your full index into smaller index-subsets, where each subset is tagged with a user-id (that is, user-partitioned)
The indexing operations will then load these index-subsets to different clusters
Once done, every search and indexing operation must include the UserID

Some important points:

Clusters can (and will) contain more than one user
No user can be on more than one cluster
No single user data can be larger than the size of a single cluster

Global vs private data

As mentioned in the music-streaming use case, clusters can have the same data as well as different data. This is seen when you allow users to search public data (music collection, public playlists) as well as private data (private playlists). Private data is managed by assigning the UserID to a particular user. With MCM, add and manage global data across every index by setting the UserID to *.

Mapping users to their cluster

Prior to implementing MCM, every Algolia cluster is assigned a unique . If you use multiple clusters, you must keep track of which application ID maps to which cluster. You must also keep track of which user is on which cluster. Therefore, to manage multiple clusters, you need a mapping table of three items: user, cluster, and application ID. Every indexing and search operation needs to go through this mapping. With MCM, the clusters in a multi-clusters configuration use the same application ID. For the mapping, every cluster contains the mapping. You only need to send the UserID to Algolia. Algolia will then do the necessary redirection to find the correct cluster. In each search, at most two clusters are involved. The first cluster is the one that receives the request. It will either be the correct cluster for that user or redirect the to the correct cluster using its local map.

Moving data and scalability

MCM scales because every cluster is independent from a data point of view: your data is split across multiple clusters, and every search is performed on its own cluster. Moving data from one cluster to another (for load balancing or adding new clusters) is a simple process. Without MCM, moving data requires a number of precise steps in the right order to avoid any downtime. Bringing in new clusters requires a number of move operations that’s both time-consuming and resource-intensive. Error handling and rollbacks can be difficult. One of the primary goals for MCM was to simplify this process of moving data and adding new clusters by minimizing the number of API calls and parameters you need to use.

The MultiClusters API

The MultiClusters API hides the difficulties associated with multiple clusters by adding a layer on top of the classical API. To work with the API, you’ll need to ensure that each record is tagged with a UserID. Once that tagging is complete, the API takes over managing user-to-cluster mapping, redirecting every update and query to its proper cluster.

Search and Discovery platform

Optimization and Personalization

AI-powered experiences

Production and scale

Multi-cluster management (MCM)

What is multi-cluster management?

Other use-cases

One search, one cluster

Multiple clusters in more detail

Load balancing

More about your data

Global vs private data

Mapping users to their cluster

Moving data and scalability

The MultiClusters API

​What is multi-cluster management?

​Other use-cases

​One search, one cluster

​Multiple clusters in more detail

​Load balancing

​More about your data

​Global vs private data

​Mapping users to their cluster

​Moving data and scalability

​The MultiClusters API

What is multi-cluster management?

Other use-cases

One search, one cluster

Multiple clusters in more detail

Load balancing

More about your data

Global vs private data

Mapping users to their cluster

Moving data and scalability

The MultiClusters API