17 Oct 2018

How to implement Multi Cluster Management on an existing implementation

Multi Cluster Management (MCM) with the MultiClusters API

When your data no longer fits on one machine, you’ll need to start adding new clusters. MCM makes this easy for you by helping you distribute and manage your data over multiple machines.

The following tutorial should help you get started using MCM.

But first, you’ll need to enable MCM on your account. Once that’s done, you can start using this feature.

The tutorial

We’ve decided to use a special kind of use case that shows how to use multiple clusters with both private and public data. Our example is a music streaming application that allows users to create public and private playlists. The private playlists are only accessible to the users who create them.

We follow a Without MCM / With MCM approach, to highlight what you’ll need to change if you are already using multiple clusters without MCM. For those implementing multiple clusters for the first time, you can look only at the MCM implmentation.

Splitting data across multiple clusters

Assign user data to a cluster

Without MCM, you assign data to an APPID, because an APPID can have only one cluster assigned to it. If you have many clusters, you will need a different APPID per cluster. This requires you to maintain on your own servers a 3-part mapping (user/cluster/APPID).

With MCM, every cluster uses the same APPID. Your data need only be sent to the right cluster. MCM uses a mapping on each cluster to route requests to the correct user cluster. All you need to do is assign a cluster the user’s data, which you can do using the endpoint assign userID:

// Not yet implemented in the js client
curl -X POST \
     -H "X-Algolia-User-ID: user42"
     -H "X-Algolia-API-Key: ${API_KEY}" \
     -H "X-Algolia-Application-Id: ${APPLICATION_ID}" \
     --data-binary "{ \
     \"cluster\":\"d4242-eu\"}" \
    "https://${APPLICATION_ID}.algolia.net/1/clusters/mapping"

Add private user data

Without MCM, every record needs a userID attribute to tag the user. Additionally, the userID attribute needs to be set as a filter at query time.

With MCM, the records will be automatically tagged by the X-Algolia-User-ID header sent at query time. The engine will then automatically add an extra attribute __userID__ inside the record to enable a user-id filter to easily identify the userID associated with the record.

var playlists = [{
  user: user42
  name: 'My peaceful playlist',
  songs: [...],
  createdAt: 1500000181
}, {
  user: user4242
  name: 'My workout playlist',
  songs: [...],
  createdAt: 1500040452
}];

// #############################
// # Without multiClusters API #
// #############################

playlists.forEach(function(playlist) {
    // Fetch from the database the associated appID and apiKey for this user
    // using your own functions (my_*)
    var appID = my_getAppIDFor(playlist.user);
    var apiKey = my_getIndexingApiKeyFor(playlist.user);
    var client = algoliasearch(appID, apiKey);
    var index = client.initIndex('playlists');

    index.setSettings({attributesForFaceting: ["filterOnly(user)"]}, function(err, content) {
      console.log(content);
    });
    index.addObject(playlist, function(err, content) {
      console.log(content);
    });
});


// ##########################
// # With multiClusters API #
// ##########################

var client = algoliasearch("MY_ONLY_APPID", "MY_INDEXING_API_KEY");
var index = client.initIndex('playlists');
playlists.forEach(function(playlist) {
    client.setExtraHeader("X-Algolia-User-ID", playlist.user)
    index.addObject(playlist,, function(err, content) {
      console.log(content);
    });
});

Add public data

Without MCM, you need to tag every public record with a special value like “public” in the userID attribute. And then you need to filter on that value to search the public records.

With MCM, you use the special userID value * to flag records as public, allowing all users to see them. Public records are automatically replicated on all the clusters of your multi-cluster setup to avoid adding network latency during the search.

var public_playlists = [{
  user: public
  name: 'TOP50 songs',
  songs: [...],
  createdAt: 1500240452
}];

// #############################
// # Without multiClusters API #
// #############################

// Fetch the list of appID and apiKey to target every clusters
// using your own functions (my_*)
var AppIDConfigurations = my_getAllAppIDConfigurations();

// Send the record to every clusters
AppIDConfigurations.each(function(AppIDConfiguration)) {
    var appID = AppIDConfiguration.appID;
    var apiKey = AppIDConfiguration.apiKey;
    var client = algoliasearch(appID, apiKey);
    var index = client.initIndex('playlists');
    index.addObjects(public_playlists, function(err, content) {
      console.log(content);
    });
});

// ##########################
// # With multiClusters API #
// ##########################

var client = algoliasearch("MY_ONLY_APPID", "MY_INDEXING_API_KEY");
var index = client.initIndex('playlists');
client.setExtraHeader("X-Algolia-User-ID", '*')
index.addObject(public_playlists,, function(err, content) {
  console.log(content);
});

Search inside the data

Without MCM, the search needs to filter on the specific user and public data.

With MCM, the engine just needs to know which userID (with the header X-Algolia-User-ID) is targeted by the query in order to route the request to the cluster holding the data. The engine will automatically add the right filters to retrieve both the user’s data and the public data.

// #############################
// # Without multiClusters API #
// #############################

// Fetch from the database the associated appID and apiKey for this user
// using your own functions (my_*)
var appID = my_getAppIDFor(record.user);
var apiKey = my_getSearchOnlyApiKeyFor(record.user);
var client = algoliasearch(appID, apiKey);
var index = client.initIndex('playlists');
index.search('peace', {
  facetFilters: ["user:user42", "user:public"]
}, function searchDone(err, content) {
  if (err) {
    console.error(err);
    return;
  }

  for (var h in content.hits) {
    console.log('Hit(' + content.hits[h].objectID + '): ' + content.hits[h].toString());
  }
});

// ##########################
// # With multiClusters API #
// ##########################

var client = algoliasearch("MY_ONLY_APPID", "MY_SEARCH_API_KEY");
var index = client.initIndex('playlists');
client.setExtraHeader("X-Algolia-User-ID", 'user42')
index.search('peace', function searchDone(err, content) {
  if (err) {
    console.error(err);
    return;
  }

  for (var h in content.hits) {
    console.log('Hit(' + content.hits[h].objectID + '): ' + content.hits[h].toString());
  }
});

Secured data access

In both cases, you will need to use API keys. Otherwise, the security of private data can be at risk because a user can change the parameters of a query to access the data of another user. To respond to this sort of security risk, API keys are used to restrict access of a user to only his or her personal data (+ the public data).

// #############################
// # Without multiClusters API #
// #############################

// Fetch from the database the associated apiKey for this user
// using your own functions (my_*)
var apiKey = my_getSearchOnlyApiKeyFor('user42');
client.generateSecuredApiKey(apiKey, {filters: '(user:user42,user:public)'});

// ##########################
// # With multiClusters API #
// ##########################

client.generateSecuredApiKey('YourSearchOnlyApiKey', {userID: 'user42'});

Note that with MCM, the header is always required even if the query contains the parameter userID in order to route the request efficiently.

Index configuration and API keys

For endpoints that have global impact on all clusters, like Add API and setSettings, the API Key stays the same, and you don’t need to provide the X-Algolia-User-ID header since the jobs are replicated on all the clusters in order to have the same API and index configuration.

© Algolia - Privacy Policy