Algolia’s storied infrastructure
Algolia has pieced together an infrastructure of maximum speed and reliability. It has carefully chosen and configured a large number of bare metal servers. It has built from scratch a reliable 3-server cluster configuration. And it runs around the clock and all over the world, with over 15 regions in 60 datacenters.
In this part of our guide, we take a look at an infrastructure that has been built from scratch, designed carefully over a long and intense period of reflection and experimentation.
This is only a summary. Check out the full story here in our CTO’s full account of the decisions and history of Algolia’s architecture.
The key features of Algolia’s infrastructure
- Low-level access to bare metal servers, as opposed to virtual - optimizing performance
- Carefully chosen server specifications - for reliability, security, and full control over performance
- 3-server clusters that offer exceptional reliability
- World-wide, regional access reducing latency, thereby increasing performance
- 24/7 round-the-clock availability
- SOC2 level of security, availability and privacy
We have 2 standard server setups:
- 6 cores CPU
- 12 threads
- 128G memory
- 2x800 GB SSD RAID-0
- 4 cores CPU
- 8 threads
- 64GB memory
- 2x400 GB SSD RAID-0
Initially, we started quite big, with 4 processors, 8 threads, 32Gb RAM, and 2x120GB SSD 320-series disks. But we found that each element needed improvement. For example:
We discovered that 4 cores and 8 threads were not always enough to handle large numbers of operations running in parallel. We ultimately went with 6 cores and 12 threads, which processes an ongoing flow of index operations without impacting the speed of the search. This choice also gives us a lot of extra (oversized) room to manage our servers’ system resources.
For the disks, we experimented with many SSD RAID-0 before choosing the right one. The current SSD S3500 series gives us a faster hard-drive, thereby removing a serious bottleneck to performance.
To strike the right balance of RAM and disk space, for both caching and data capacity, we ran through a number of use cases and engine tweaks and countless performance tests before arriving at the right amount of memory and disk size - no more nor less. For example, we needed to increase the memory to process large indexes while leaving enough space to perform in-memory searches.
To get an idea of how balancing RAM size with the right number of cores and threads, consider large indexes. Algolia puts every customer’s full index in RAM. Additionally, it breaks up large indexes into smaller pieces called shards, where each shard (up to 4) is given a dedicated thread on the same server. This permits faster updates and searches. Meanwhile, other threads are dedicated to processing searches in parallel as well as system monitoring and consensus management. As you can see, our machines require a large minimum of RAM, + cores, with a reasonable buffer for flexibility.
To get far more detail about our engine, take a look at our CTO’s 8-part series Inside the Engine.
Thus, careful configuration serves our goals to offer reliable and fast search, with enough capacity to handle all our customers.
Algolia’s bare metal servers
While virtualization is the choice of the vast majority of SaaS services, Algolia has decided to use bare metal servers. Bare metal servers give applications direct access to the physical and software resources of a computer. For example, all Algolia search and indexing operations are processed by the Algolia engine, which in turn directly interacts with a computer’s essential resources such as the Operating System, CPU, RAM, and disk.
With a virtual machine, on the other hand, a user needs to pass by one or more additional layers of software before using the services of the underlying physical server. This slows things down, but it also creates flexibility, by spreading a single server’s capacity over many discreet use-cases: one customer might want to run a massive SQL Server database on a Windows computer; another customer might want to perform CPU-intensive calculations using an old version of unix; and another might want to simulate a macintosh - all of which can be done on a shared server using virtualization.
For Algolia, virtual machine are not necessary. Bare metal servers have been time and task slicing for years, without the need to virtualize. With powerful components, a single server can handle countless customers - especially if they are all doing the same things, which is the case with Algolia.
Additionally, many of our larger customers use dedicated servers (or more precisely clusters), giving them exclusive access to the entire cluster. Most of our smaller accounts share servers - which means that they share the same server (cluster) and Algolia engine.
Note that when we say server, we are actually referring to a cluster of 3 identical servers. Thanks to clusters, Algolia can provide an SLA reliability of 99.99%, because a cluster guarantees that at least 1 of the 3 servers will be available at all times.
If you want more hardware and system software details, or to learn more about Algolia’s architectural decision-making process, take a look at the following articles.
- The history of Algolia architect as told by our CTO/architect
- Our architecture in even more detail
- How our architecture is specifically designed for Search-as-a-Service
- A word about our solid state hard-drives
- An article about how we achieve ultra-low latency
You can monitor your servers and clusters via the dashboard. Go to Dashboard -> API Status, then click on cluster name.
For Enterprise customers, we have a Monitoring API which provides a window into all cluster and DSN activity.
Did you find this page helpful?
We're always looking for advice to help improve our documentation!
Please let us know what's working (or what's not!).
We're constantly iterating thanks to the feedback we receive.