Distributed Search Network (DSN)
On this page
The reach and power of DSN
DSN extends regional access
Algolia’s Distributed Search Network (DSN) adds one or more satellite servers to a cluster. This extends the reach of an Algolia cluster into other regions, closer to end users.
Take the example of an Algolia customer on the East Coast of the United States, whose cluster is close to their servers in New York. Yet, not all their end users are located on the East Coast: they might have a significant customer base in California, for instance. With only a single cluster on the East Coast, Californian users may have slightly slower search performances than users in New York.
For that reason, putting a DSN server on the West Coast might be a good idea to bring data closer to the West Coast users. Adding DSN servers in strategic regions reduces network latency, improves performance, and enhances user experience.
Even though we already address network latency by placing our clusters in many regions around the world, we go one step further, allowing you to add individual DSN servers into regions closer to your users.
DSN boosts the engine’s processing power
In addition to bringing the engine closer to your users, DSN servers also extend the processing power of your clusters. They can share the load of extensive cluster activity: a customer can offload requests to a DSN whenever their cluster(s) reach peak usage.
What is a DSN server, and how does it get its data?
A DSN server is a powerful, fully functioning, self-sufficient bare-metal machine. It is a replication of your primary cluster. Each DSN runs independently and contains the full data and settings of its primary cluster.
However, the DSN network is equally reliable when accessed with our API clients. As opposed to our REST API, our official API clients implement a retry strategy that switches to the primary cluster whenever a DSN server goes down.
Getting data to the DSN
DSN servers aren’t a backup of their primary clusters: they build their own indices.
A DSN gets its data by processing indexing jobs on its own. A primary cluster sends all indexing jobs to its DSNs, which the DSNs process independently. This is how a DSN gets its data: not via backup, but by repeating the same indexing process as its primary cluster.
The primary cluster doesn’t send an indexing job to a DSN until it has finished processing it. More specifically, the DSN gets the job only once the machines on the cluster have achieved consensus. Therefore, DSNs aren’t immediately in sync with their clusters: there is a slight delay (between seconds and minutes), depending on the size of the indexing job.
To get an estimate of that delay, you need to factor in the network latency between a cluster and its DSN and add the time it takes for the DSN to process the indexing job.
How do you activate a DSN?
DSN servers are accessible to our current Standard and Premium plans with an annual commitment, as well as some paid legacy plans (before July 1st, 2020). Since adding DSN servers to your application requires us to provision additional infrastructure, adding DSN servers costs extra. Please reach out to firstname.lastname@example.org for more information.
Once a DSN server is attached, Algolia takes care of the distribution and synchronization of your indices around the world. We automatically route queries to the closest data center among those you’ve selected, ensuring the best possible experience.
Note that on specific plans, you can choose to have more than one DSN in the same region. Users with a worldwide client base may need DSN servers distributed over many regions and DSNs within the same region for handling extensive usage.
You can monitor your DSNs via the Algolia dashboard.
Front-end implementation for reduced latency
A DSN can only improve network latency with front-end search implementations (web and mobile).
Why is this? If you’re using a DSN to bring data closer to your end users, we need their IP address to determine the closest server. With a back-end search implementation, your end users first contact your server, wherever it is in the world. Then, it’s your server that performs calls to Algolia, with its own IP address.
When you’re using a DSN server to reduce latency, we also recommend having your server and your primary Algolia cluster near each other to speed up back-end indexing operations.
If, however, you’re using a DSN for more processing power, you can use a client or server-side search implementation.
Retries and fallback (failover) logic
All our API clients implement a retry strategy that uses up to four different URLs for every search request: one for the DSN and three for the cluster.
The very first request always goes to the closest server. We call it the smart server because it could be any one of the three servers in your cluster, or it could be your DSN if you have one. If this first attempt works, the search goes through. If it fails, the clients activate their retry logic:
- Try to connect to the primary cluster using one of its three URLs.
- If that fails, use a second URL of the same cluster.
- If that fails, use a third URL of the cluster.
- If that fails, send a timeout.
With this fallback logic, Algolia ensures a high degree of availability over a widely-distributed infrastructure.
Only our API clients provide this kind of failover reliability. Because of this, we strongly encourage our customers to use our API clients instead of the REST API directly.
The Algolia infrastructure (where your data lives) is addressable by five different URLs:
- Smart records (NS1)
- This record tries to find the closest server to perform a query.
- It contains all the cluster’s servers and, if a DSN is configured, it also considers the additional servers.
- All servers in the configured pool are considered equal, and their location is taken into account. It means that if the configuration is one cluster + one DSN, they’re treated as four identical servers that can process searches. As long as one of them is available, the record returns the address.
- This record is used for indexing.
- Fallback records (Cloudflare), designed to address the availability zones of the clusters