Algolia DevCon
Oct. 2–3 2024, virtual.

MultiClusters API client methods

List of methods

A brief technical overview

How to split the data (logical split)

The data is split logically. Algolia decided not to go with a hash-based split, which requires the aggregation of answers from multiple servers and adds network latency to the response time. Normally, the data will be user-partitioned - split according to a user-id.

Uses a single appID

If one appID was used per cluster, multi-clusters would require many appIDs. However, this would be difficult to manage, especially when moving data from one cluster to another (to balance the load). The API therefore relies on a single appID: the engine routes requests to a specific destination cluster, using a new HTTP header, X-ALGOLIA-USER-ID, and a mapping that associates a userID to a cluster.

What MCM doesn’t do

Since the data is broken up logically, the split is done so that only one server is required to perform a complete search. This API doesn’t aggregate the response from multiple clusters: the feature will stay fast even with many clusters in multiple regions.

Shared configuration

With MCM, all the settings, rules, synonyms, and API key operations are replicated on all the machine (to have the same configuration inside the clusters). Only the records stored in the index are different between two clusters.

Shared data

For some use cases, there are two types of data:

  • Public data
  • Private user data

The public data can be searched at the same time as private user data. With MCM, it’s possible to create public records with the multi-clusters using the special userID value * to replicate the record on all the clusters and make it available for search.

Object IDs

The objectIDs need to be unique from the userIDs to avoid a record of one userID to override the record of another userID.

The objectID needs to be unique also because of the shared data which can be retrieved at the same time as the data of one specific user. The recommendation is to append the userID, of the specific user, to the objectID: this ensures the objectID is unique.

Number of indices

MCM is design to work on a small number of indices (< 100). This limitation is mainly here to preserve the performance of the user migration. To migrate a user from one cluster to another, the engine needs to enumerate all the records of this specific user to send it to the destination cluster and so loop on all the indices, the cost of the operation is directly linked to the number of indices.

A small number of indices also allow the engine to optimize more the indexing operations by batching the operation of one index together.

Check out the tutorial

Perhaps the best way to understand the MultiClusters API is to check out the MCM tutorial, where code samples are used ti explain the most important endpoints.

Limitation v0.1

For v0.1, the assignment of users to clusters won’t be automatic: if a user is not properly assigned, or not found, the call will be rejected.

As you will notice, the documentation is actually using the REST API endpoints directly.

How to get the feature

MCM needs to be enabled on your cluster. Contact the Algolia support team for more information.

Multi-cluster usage

With a multi-cluster setup, the userID needs to be specified for each of the following methods:

Each of these methods allows you to pass any extra header to the request. Algolia uses the X-Algolia-User-ID header.

Here is an example of the search method, but the principle is the same for all the methods listed in e preceding sections:

1
2
3
4
5
$index = $client->initIndex('your_index_name');

$res = $index->search('query string', [
  'X-Algolia-User-ID' => 'user123'
]);

You can find an example of how to pass extra headers for the other methods in their respective documentation.

Did you find this page helpful?