Search by Algolia

Sorry, there is no results for this query

What Caused Today’s Search Performance Issues In Europe and Why It Will Not Happen Again

Mar 17th 2014 product

What Caused Today’s Search Performance Issues In Europe and Why It Will Not Happen Again

During a few hours on March 17th you may have noticed longer response times for some of the queries sent by your users.

Slower than average search performance

Average latency for one of our European clusters on March 17th

As you can see above, our slowest average response time (measured from the user’s browser to our servers and back to the user’s browser) on one of our European clusters peaked at 858ms. On a normal day, this peak is usually no higher than 55ms.

This was clearly not a normal behavior for our API, so we investigated.

How indexing and search calls share the resource

Each cluster handles two kinds of calls on our REST API: the ones to build and modify the indexes (Writes) and the ones to answer users’ queries (Search). The resources of each cluster are shared between these two uses. As Write operations are far more expensive than Search calls, we designed our API so that indexing should never use more than 10% of these resources.

Up until now, we used to set a limitation on the rate of Writes per HTTP connection. There was no such limit for queries (Search); We simply limited Write calls to keep search quality. To avoid reaching the Write rate limit too quickly, we recommended users to Write by batching up to 1GB of operations per call, rather than sending them one by one. (A batch, for example, could be adding 1M products to an index on a single network call.) A loophole in this recommendation was the origin of yesterday’s issues.

What happened yesterday is that on one of our European clusters, one customer pushed so many unbatched indexing calls from different HTTP connections that they massively outnumbered the search calls of the other users on the cluster.

This eventually slowed down the average response time for the queries on this cluster, impacting our usual search performance.

The Solution

As of today, we now set the rate limit of Writes per account and not per HTTP connection. It prevents anyone from using multiple connections to bypass this Write rate limit. This also implies that customers who want to push a lot of operations in a short time simply need to send their calls in batches.

How would you batch your calls? The explanation is in our documentation. See here for an example with our Ruby client:

About the author
Julien Lemoine

Co-founder & former CTO at Algolia


Algolia infrastructure

More info
Algolia infrastructure

Recommended Articles

Powered byAlgolia Algolia Recommend

Announcing Multi Cluster Management:  SaaS Search Built to Scale

Xavier Grand

Engineering Manager

Inside the Algolia Engine Part 1 — Indexing vs. Search

Julien Lemoine

Co-founder & CTO at Algolia

Resilience testing in production: test as you deploy

Xavier Grand

Engineering Manager