Algolia vs Elasticsearch: End-to-end Latency

Before comparing Algolia to Elasticsearch, it’s good to understand a few things about the nature of search.

Search architecture is unique

The type and quality of search experience you can deliver depends heavily on your choice of search engine, hardware, datacenter region and front-end web and mobile development frameworks. It’s important to make the right choice for each part of the stack but it’s equally important to make a set of choices that work together as a whole. Because search user experience goals are so demanding, a vertically-integrated approach to architecture is more important than for other types of applications. Latency, for example, is not only a function of the search engine but of every step between the search user interface and backend infrastructure.

Search is mission critical

Search is one of the hardest features to get right, both because users benchmark search experiences against Google & Amazon, and because search is a balance of multiple disciplines, not limited to UX, relevance tuning and performance optimization. Development teams often delay building search because of a lack of confidence in getting it right and the fear that it will take longer than expected. Yet search is often the most mission critical feature—the quality of an application’s search has a big influence on the perception of application’s overall quality. In domains like e-commerce, introducing a search bug can cost millions of dollars.

The combination of these factors make search one of the riskiest areas of development for business and consumer applications. When comparing different ways of delivering a solution, like Algolia and Elasticsearch, we want to look at how each approach specifically addresses the full, end-to-end set of risks. In this comparison, we will look not only at the search engine but the full search architecture, starting with end-to-end latency.

Mission-critical search for a global user base

There are many different types of search applications. To focus the comparison of Algolia and Elasticsearch, we want to hone in on a specific family of use cases which we refer to consumer-grade search. Consumer-grade search is the type of experience delivered by companies like Google, Amazon and Facebook to billions of people worldwide. It connects people with products, content and key pieces of structured data. It is fast, reliable, works on multiple platforms and the results are highly relevant.

The search tolerates misspellings, alternate phrasings or user mistakes. Relevance is not caveat emptor, it’s caveat venditor – the search must adapt to the user, not the other way around. Consumers have high expectations of relevance but equally demanding expectations for the user interface. They expect an effortless, multi-faceted search and browsing experience, the kind pioneered by sites like Amazon.

Consumer-grade search doesn’t just apply to consumer-facing applications. Today’s business application users have become just as demanding, in part because many business applications are now distributed in app stores and compete directly with consumer versions.

The expectations of the average user can seem unattainably high, but this is why Algolia exists. Algolia is laser-focused on helping customers meet the perfectly unreasonable search expectations of the average Internet user.

About Elasticsearch

As a search engine that also functions as a scalable NoSQL database, Elasticsearch accommodates many different types of applications while not being opinionated toward one specific case. Elasticsearch is used for search but also log processing, real-time analytics, running map-reduce and other distributed algorithms, and even as an application’s primary database. The breadth of Elasticsearch is impressive and it does things that Algolia is not well suited for – streaming logs, map/reduce querying, complex aggregations and operating on billions of documents at a time. Algolia itself has used Elasticsearch internally for tasks like storing logs and computing rollups.

In this comparison, however, we are focusing on consumer-grade search. This is the most common situation we are asked to compare. Building a consumer-grade search application with Elasticsearch requires a nontrivial amount of backend and front-end software engineering. There are many more steps than just provisioning and operating an Elasticsearch cluster.

In this series we’ll dive into what some of those steps are; however, you can already take a look at how Algolia solves for these steps in our Inside the Engine series. In it we explore implementation details like I/O optimization, query tokenization, multi-attribute relevance, highlighting and synonym handling. These are features that must be accounted for in any search project, including those with Elasticsearch at the core.

End-to-end latency budget

The first feature of search is speed. Whole-transaction latency, from keystroke to visible search result, is what forms the user’s first impression of a search. A search application architect needs to have this in mind from the beginning, as a huge number of factors can affect the end-to-end latency.

To make things more difficult, for consumer-grade search the upper bound on satisfactory end-to-end latency is very, very low. Most consumer search experiences, including Google, Facebook and those of Algolia customers, deliver new results with every keystroke. This type of experience, known as instant search, is loved by users for its interactive feel but it only works if search results can be returned in the blink of an eye. Less, even: a human eye blink takes 300-400 milliseconds. An instant search starts to feel laggy at only 100 milliseconds.

For as-you-type search to be as satisfying as possible, Algolia recommends the end-to-end latency be no more than 50ms. This is the speed at which search feels truly real-time, where the user feels in full control of the experience. Under these conditions, users are likely to keep reformulating their query until they find what they’re looking for, rather than abandon or bounce.

If you’re using Elasticsearch or Algolia to power an as-you-type search, these are the important numbers to keep in mind as you design your architecture. It is possible to consistently reach these numbers if you know 1) where latency is likely to accumulate and 2) how to reduce or eliminate it.

That’s what we’ll look at in the following side-by-side table: how Algolia reduces latency in each layer of the stack, where latency can accumulate inside of Elasticsearch, and what work can be done inside or on top of an Elasticsearch implementation to reduce the risk of added latency.

Algolia

Elasticsearch

Global User Base

Low device-to-datacenter latency requires infrastructure in multiple regions.

Tip: add 1-2ms for every 124 miles of distance over fiber.

Automatically replicate indices to any of 15 regions throughout the world using our Distributed Search Network (DSN).

It is possible to cluster Elasticsearch across multiple data centers, but not recommended. The recommended solutions involve replicating manually via a messaging queue to clusters that are not aware of each other.

RAM

If a query’s data is not all in RAM, it may have to load data from the much slower disk.

Algolia indices are stored in RAM (256GB or more) and memory mapped to the nginx process. No pre-loading (warming) is required to get great performance for the first query.

The ES cluster must have enough RAM and be properly tuned to make sure large indices stay in memory. If you are also supporting an analytics workload, you risk large analytics queries evicting data for searches.

Virtualization

In sharing hosting environments like AWS, performance can fluctuate because of contention with other customers.

Algolia runs on bare metal hardware with high-frequency 3.5GHz-3.9Ghz Intel Xeon E5–1650v3 or v4 CPUs. Clock speed is directly related to search speed.

Elasticsearch can be deployed on bare metal and optimized hardware, but at a premium cost compared to AWS or cloud-based solutions.

Sorting

Before results are presented to the user, they have to be put in the right order.

Algolia presorts results at indexing time according to the relevance formula and custom ranking. There is a minimal sorting step at the end to account for dynamic criteria like the number typos and proximity of words.

Sorting is done at the end of each query. Depending on the number of results to be sorted, this can impact latency.

Relevance

Speed is often traded off to get better relevance.

Tokenization required for partial word matching and typo tolerance is done mostly at indexing time.

Advanced techniques like ngrams, shingles and fuzzy matching make indices larger and also require analysis at query time.

DNS

DNS can be slow before it’s cached by the user’s device. If a DNS provider is under DDOS, requests will be slow or fail to complete.

Algolia uses two DNS providers to increase reliability. Logic to fallback from one to the other is part of all API clients.

Elasticsearch does not provide out-of-the-box support for redundant DNS, but you could build it yourself.

Load Balancing

Load balancing & coordination can cause network congestion and add latency.

Algolia API clients connect directly to the server with data on it. There is no network hop or single point of failure for reaching a cluster.

An ES cluster needs the right ratio of data nodes and coordinating nodes to avoid adding latency. 10G network bandwidth is recommended for large clusters.

Garbage Collection

Applications running in the JVM require momentary pauses to free up used memory. During these pauses, requests are queued.

The Algolia engine is written in C++, it does not use the JVM.

The JVM can be tuned to reduce the frequency and impact of GC pauses. The tuning depends on the workload and server resources available. This is a painstaking process about which much has been written.

Sharding

Sharding allows data to be scaled across multiple indices. Overloaded shards exhibit degraded performance.

Algolia handles any required sharding behind the scenes, it is invisible to the user. Shards can be dynamically rebalanced to avoid hot spots.

If original shard assumptions are wrong, such as the choice of a shard key, an Elasticsearch cluster will have to be rebuilt or rebalanced down the road to alleviate performance hotspots.

Heavy Indexing

Large indexing operations can negatively impacts search performance because they compete for the same CPU and memory.

Algolia splits search and indexing into separate processes with different CPU priorities.

The Elasticsearch cluster must be configured to use different nodes for searching and indexing.

Conclusion

Latency can creep in from any number of places. Great care needs to be taken at each layer of the stack to avoid exceeding the latency budget and causing users to abandon. Algolia’s hosted search approach means that we can give our customers the benefit of our expertise in reducing latency. For users of Elasticsearch, latency needs to be understood and addressed by the implementing engineering team.

Read other parts of the Comparing Algolia and Elasticsearch for Consumer-Grade Search series:

Part 1 – End-to-end Latency
Part 2 – Relevance Isn’t Luck
Part 3 – Developer Experience and UX

Comparing Algolia and Elasticsearch For Consumer-Grade Search Part 1: End-to-end Latency