Guides / Scaling / Multi-Cluster Management (MCM)

Implementing MCM

When your data no longer fits on a single machine, you have to add new clusters. Multi-Cluster Management (MCM) simplifies this process by letting you distribute and manage your data across several machines.

For example, let’s say you have a music streaming application that lets users create public and private playlists. With MCM, the number of playlists can grow without fearing they might exceed your current cluster’s size limit. You also get dedicated user access out of the box.

We follow a comparative approach to highlight what you need to change if you’re already using multiple clusters without MCM. If you’re implementing multiple clusters for the first time, you can look only at the MCM implementation.

Enabling MCM

First, you must enable MCM on your account. Please contact your Solutions Engineer, CSM, or send us an email at enterprise@algolia.com

Splitting data across multiple clusters

Assigning user data to a cluster

Without MCM, an application is bound to a single cluster. It means that splitting data across several clusters requires you to assign data to a specific application ID, so you can manually maintain and orchestrate the distribution by yourself.

With MCM, all clusters use the same application ID. MCM keeps a mapping on each cluster to route requests to the correct cluster. All you need to do is assign a cluster to a user with the assignUserId method.

1
$client->assignUserId('user42', 'd4242-eu');

Adding private user data

Imagine that you want to index your users’ private playlists.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
const playlists = [
  {
    user: 'user42',
    name: 'My peaceful playlist',
    tracks: [
      // ...
    ],
    createdAt: 1500000181
  },
  {
    user: 'user4242',
    name: 'My workout playlist',
    tracks: [
      // ...
    ],
    createdAt: 1500040452
  }
];

Without MCM, every record needs a userID attribute to tag the right user. Additionally, you need to set the userID attribute as a filter at query time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
playlists.forEach((playlist) => {
  // Fetch from your own data storage and with your own code
  // the associated application ID and API key for this user
  const appID = getAppIDFor(playlist.user);
  const apiKey = getIndexingApiKeyFor(playlist.user);

  const client = algoliasearch(appID, apiKey);
  const index = client.initIndex('playlists');

  index
    .setSettings({ attributesForFaceting: ['filterOnly(user)'] })
    .then(() => {
      console.log('Done');
    });

  index
    .saveObject(playlist, {
      autoGenerateObjectIDIfNotExist: true,
    })
    .then(({ objectID }) => {
      console.log(objectID);
    });
});

With MCM, all records are automatically tagged using the X-Algolia-User-ID header you send at query time. The engine automatically adds an extra attribute __userID__ inside records to identify the userID associated with each record.

1
2
3
4
5
6
7
8
9
10
11
12
13
const client = algoliasearch('YourApplicationID', 'YourAdminAPIKey');
const index = client.initIndex('playlists');

playlists.forEach((playlist) => {
  client.setExtraHeader('X-Algolia-User-ID', playlist.user);
  index
    .saveObject(playlist, {
      autoGenerateObjectIDIfNotExist: true,
    })
    .then(({ objectID }) => {
      console.log(objectID);
    });
});

For simplicity’s sake, the above snippets add records one by one. For better performance, we recommend grouping your records by userID first, then batching them.

Adding public data

Now, let’s say you want to index public playlists as well.

1
2
3
4
5
6
7
8
9
10
const playlists = [
  {
    user: 'public',
    name: 'Hot 100 Billboard Charts',
    tracks: [
      // ...
    ],
    createdAt: 1500240452,
  },
];

Without MCM, you need to tag every public record with a particular value (e.g., “public”) for the userID attribute. Then, you need to filter on that value to search for public records.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Fetch from your own data storage and with your own code
// the list of application IDs and API keys to target each cluster
const configurations = getAllAppIDConfigurations();

// Send the records to each cluster
configurations.each(({ appID, apiKey } = {}) => {
  const client = algoliasearch(appID, apiKey);
  const index = client.initIndex('playlists');

  index
    .saveObjects(playlists, { autoGenerateObjectIDIfNotExist: true })
    .then(({ objectIDs }) => {
      console.log(objectIDs);
    });
});

With MCM, you can use the userID value * to flag records as public, allowing all users to see them. Public records are automatically replicated on all clusters to avoid network latency during search.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const client = algoliasearch('YourApplicationID', 'YourAdminAPIKey');
const index = client.initIndex('playlists');
client.setExtraHeader('X-Algolia-User-ID', '*');

index
  .saveObjects(playlists, {
    autoGenerateObjectIDIfNotExist: true,
    headers: {
      'X-Algolia-User-ID': true,
    },
  })
  .then(({ objectID }) => {
    console.log(objectID);
  });

Searching the data

Without MCM, you need to manually handle the filtering logic to filter on private data for a specific user and public data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Fetch from your own data storage and with your own code
// the associated application ID and API key for this user
const appID = getAppIDFor('user42');
const apiKey = getSearchOnlyApiKeyFor('user42');

const client = algoliasearch(appID, apiKey);
const index = client.initIndex('playlists');

index
  .search('peace', {
    facetFilters: ['user:user42', 'user:public'],
  })
  .then(({ hits }) => {
    console.log(hits);
  });

With MCM, all the engine needs is the userID (with the X-Algolia-User-ID header) that the query targets, so it can route the request to the right cluster. It automatically adds the right filters to retrieve both the user’s data and public data.

1
2
3
4
5
6
7
8
9
10
11
12
const client = algoliasearch('YourApplicationID', 'YourSearchOnlyAPIKey');
const index = client.initIndex('playlists');

index
  .search('peace', {
    headers: {
      'X-Algolia-User-ID': 'user42',
    },
  })
  .then(({ hits }) => {
    console.log(hits);
  });

Accessing secured data

You can’t rely on filters to select what data to return to a specific user because end users can change these parameters. If all you have to pick specific data is query time filters, anyone can alter these to access the data of another user.

To properly restrict access of a user to only their data (and the public data), you need to use secured API keys.

Without MCM, you need to set the filters to authorize manually.

1
2
3
const securedApiKey = client.generateSecuredApiKey('YourSearchOnlyApiKey', {
  filters: 'user:user42 AND user:public',
});

With MCM, you still need to generate secured API keys, but all you have to do is provide the userID. It automatically adds the right filters to authorize.

1
2
3
const securedApiKey = client.generateSecuredApiKey('YourSearchOnlyApiKey', {
  userID: 'user42',
});

With MCM, you always need to provide the X-Algolia-User-ID header, even when the query contains the userID parameter. We need the header to route the requests efficiently.

Index configuration and API keys

When performing actions that have global impact on all clusters, such as using addApiKey or setSettings, you don’t need to provide the X-Algolia-User-ID header.

Did you find this page helpful?