What is multi-cluster management?
Normally, all of your data can fit onto one cluster. However, when your data becomes too large for one cluster, Multi-Cluster Management (MCM) offers a way to logically break up your data so that:- Your data spans more than one cluster,
- Algolia can perform a full query on only one cluster.
Other use-cases
Managing user emails is only one use case.- SaaS providers like Salesforce or Dropbox can use MCM to manage their data. This is similar to the email system, but with a unique customer id - each cluster gets a different set of customers.
- Music streaming services like Deezer and Spotify. Initially, a single cluster may be sufficient to house the full collection of music: all private and public playlists. However, when the number of private playlists becomes too large to fit on one cluster, you would move to multiple clusters. Now, every cluster will still contain the full collection of music and public playlists, but MCM will enable [distributing the private playlists over different clusters]. MCM is easily scalable. As a result, any company with only 1 cluster can scale up to as many clusters as they need, to match the growth of their own customer base. The only requirement is that their data can be split into discreet chunks (by UserID or customerID).
One search, one cluster
MCM does not enable searching across clusters, nor does it merge results from different clusters. The distribution is based on a split where a user’s entire dataset can be placed on a single cluster, so that only one cluster is actually needed to perform the complete search. MCM does not, in other words, aggregate results from different clusters. This works perfectly well for an email system, because users have exclusive access to their own data. In contrast, consider an online library that enables full-text book searching. You couldn’t spread the books on different servers and expect complete results from a single-cluster search. You would need to search all of the clusters and then aggregate the results.Multiple clusters in more detail
Load balancing
One of the natural consequences of splitting up your data is that the split will not always stay balanced - some users will have significantly more data than others. MCM simplifies load balancing. You will base your initial cluster distribution on current usage but, over time, as usage grows, the initial balance will get undone. With MCM, it will be easy for you to move your user data around to sustain a balanced state.More about your data
Just to reiterate how MCM works:- You’ll first want to slice up your full index into smaller index-subsets, where each subset is tagged with a user-id (that is, user-partitioned)
- The indexing operations will then load these index-subsets to different clusters
- Once done, every search and indexing operation must include the UserID
- Clusters can (and will) contain more than one user
- No user can be on more than one cluster
- No single user data can be larger than the size of a single cluster