17 Oct 2018

DSN (Distributed Search Network)


The reach and power of DSN

DSN extends regional access

Algolia’s Distributed Search Network (DSN) adds one or more satellite servers to a cluster. This extends the reach of an Algolia cluster into other regions, closer to a customer’s end-users.

Take the example of an Algolia customer on the East Coast of the US, whose cluster is close to their servers in NY. They might want to add a DSN server on the West Coast to bring their data closer to its West Coast clients. Putting a DSN on the West Coast will reduce network latency, improving performance.

So even though we already address network latency by placing our clusters in 15 regions around the world, we go one step further, bringing individual DSN servers into regions closer to your users. In this way, DSN reinforces and extends the reach of your search.

DSN boosts the engine’s processing power

In addition to bringing the engine closer to your users, DSN servers also extend the processing power of your clusters. They can be used to share the load of large cluster activity: a customer can offload requests to the DSN whenever its cluster(s) reach peak usage.

What is a DSN server, and how does it get its data?

DSN servers

A DSN server is a powerful, fully functioning, self-sufficient bare-metal machine. It is a replication of your main cluster - which we call the Primary Cluster. Each DSN runs independantly, and contains the full data and settings of its primary cluster.

The significant difference between an Algolia 3-server cluster and a DSN is that a DSN is a single machine. As a result, DSNs do not provide cluster-level redundancy.

However, as will be discussed below, the DSN network can be equally reliable if accessed via our API Clients. Our API Clients (as opposed to our REST API) use a retry / fallback logic that switches to the cluster if a DSN server goes down.

Getting data to the DSN

DSN servers are not a backup of their clusters. They build their own indices. Here are some points to consider.

  • Regarding data: A DSN gets its data by processing indexing jobs on its own. A primary cluster sends all indexing jobs to its satellite DSNs, which the DSN processes independantly. This is how a DSN gets its data - not by a backup but by repeating the same indexing process as performed by the servers of its primary cluster.
  • Regarding synchronization: The primary cluster does not send an indexing job to a DSN until it has finished processing it. More specifically, the DSN gets the job only after the machines on the cluster have achieved consensus. Therefore, DSNs are not immediately in sync with their clusters. There is therefore a slight delay (seconds or minutes, depending on the size of the indexing job). To get an estimate of that delay, you’ll need to factor in the network latency between a cluster and its DSN, + the time it takes the DSN to process the indexing job.

How do you activate DSN?

Today, DSN is accessible on all paid plans. To activate it, go to the “Infra” section of your Algolia dashboard. There, you will see a map and a selection of your top countries in terms of search traffic.

Dsn dashboard infra map

Just select a DSN data center on the map, or in the control panel below.

Dsn dashboard infra controls

Algolia will then automatically take care of the distribution and synchronization of your indices around the world. End-users’ queries will be automatically routed to the closest data center among those you’ve selected, ensuring the best possible experience.

Note that on some plans, a customer can choose to have more than one DSN in the same region. It really depends on a customer’s needs. A customer with a worldwide client-base will certainly have different DSN servers distributed over many regions. And for handling large usage, they may also want several DSNs within the same region.

You can also use the dashboard to monitor your DSNs.

DSN Front-end implementation for better proximity

For the DSN to improve network latency, it will need to be implemented on the front-end, using the Javascript, Android, or iOS API clients.

Why is this? For the simple reason that if you are using DSN to bring servers closer to your end-users, you will need to use an end-user’s IP to determine the closest server - which can only be done via the front end.

Doing it from the server-side would defeat this purpose because it would require your end-users to first contact your server - wherever it is in the world - before contacting the Algolia server.

In this case, when you are using DSN for purposes of improved proximity, the best scenario is to have your server and primary cluster near each other, to speed up back-end indexing operations; and to have your end-users contact the closest DSN for all their search requests.

That said - If you are using DSN for more processing power, you can use a client or server-side search solution.

Retries and fallback (failover) logic

All our API clients implement a retry logic that uses up to 4 URLs for every search request: 1 for the DSN and 3 for the cluster. Here’s how it works.

The very first request is always to the closest server. We call this the smart server because it could be any one of the 3 servers in your cluster or your DSN, if you have one. If this first attempt works, perfect, the search goes through. But if it fails, there is a 3-part retry logic:

  1. Try to connect to the cluster using 1 of its 3 URLs. If that fails,
  2. Use a 2nd URL of the same cluster. If that fails,
  3. Use a 3rd URL of the cluster. If that fails, send a timeout.

With this 3-step fallback logic, Algolia ensures a high degree of availability over a widely-distributed infrastructure.

Note: Only our API clients provide this kind of failover reliability. Because of this, we consistently encourage our customers to use our API clients instead of the REST API.

Technical Details

The Algolia infrastructure on which the customer’s data is located is addressable by 5 different URLs:

  • smart records (NS1)

    • appid-dsn.algolia.net
      • This record is trying to find the closest server to perform the search query.
      • It always contains all the servers of the cluster and, if DSN is configured, it also considers the additional servers.
      • All servers in the configured pool are considered to be equal and their location is taken into account, which means that if the configuration is 1 cluster + 1 DSN, they are being treated as 4 identical servers that can search and as long as 1 of them is available, the record will return the address.
    • appid.algolia.net
      • This record is used for indexing.
  • fallback records (Cloudflare) - designed to address the availability zones of the clusters

    • appid-1.algolianet.com
    • appid-2.algolianet.com
    • appid-3.algolianet.com
© Algolia - Privacy Policy