Search by Algolia
Introducing new developer-friendly pricing
algolia

Introducing new developer-friendly pricing

Hey there, developers! At Algolia, we believe everyone should have the opportunity to bring a best-in-class search experience ...

Nick Vlku

VP of Product Growth

What is online visual merchandising?
e-commerce

What is online visual merchandising?

Eye-catching mannequins. Bright, colorful signage. Soothing interior design. Exquisite product displays. In short, amazing store merchandising. For shoppers in ...

Catherine Dee

Search and Discovery writer

Introducing the new Algolia no-code data connector platform
engineering

Introducing the new Algolia no-code data connector platform

Ingesting data should be easy, but all too often, it can be anything but. Data can come in many different ...

Keshia Rose

Staff Product Manager, Data Connectivity

Customer-centric site search trends
e-commerce

Customer-centric site search trends

Everyday there are new messages in the market about what technology to buy, how to position your company against the ...

Piyush Patel

Chief Strategic Business Development Officer

What is online retail merchandising? An introduction
e-commerce

What is online retail merchandising? An introduction

Done any shopping on an ecommerce website lately? If so, you know a smooth online shopper experience is not optional ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

5 considerations for Black Friday 2023 readiness
e-commerce

5 considerations for Black Friday 2023 readiness

It’s hard to imagine having to think about Black Friday less than 4 months out from the previous one ...

Piyush Patel

Chief Strategic Business Development Officer

How to increase your sales and ROI with optimized ecommerce merchandising
e-commerce

How to increase your sales and ROI with optimized ecommerce merchandising

What happens if an online shopper arrives on your ecommerce site and: Your navigation provides no obvious or helpful direction ...

Catherine Dee

Search and Discovery writer

Mobile search UX best practices, part 3: Optimizing display of search results
ux

Mobile search UX best practices, part 3: Optimizing display of search results

In part 1 of this blog-post series, we looked at app interface design obstacles in the mobile search experience ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Mobile search UX best practices, part 2: Streamlining search functionality
ux

Mobile search UX best practices, part 2: Streamlining search functionality

In part 1 of this series on mobile UX design, we talked about how designing a successful search user experience ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Mobile search UX best practices, part 1: Understanding the challenges
ux

Mobile search UX best practices, part 1: Understanding the challenges

Welcome to our three-part series on creating winning search UX design for your mobile app! This post identifies developer ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Teaching English with Zapier and Algolia
engineering

Teaching English with Zapier and Algolia

National No Code Day falls on March 11th in the United States to encourage more people to build things online ...

Alita Leite da Silva

How AI search enables ecommerce companies to boost revenue and cut costs
ai

How AI search enables ecommerce companies to boost revenue and cut costs

Consulting powerhouse McKinsey is bullish on AI. Their forecasting estimates that AI could add around 16 percent to global GDP ...

Michelle Adams

Chief Revenue Officer at Algolia

What is digital product merchandising?
e-commerce

What is digital product merchandising?

How do you sell a product when your customers can’t assess it in person: pick it up, feel what ...

Catherine Dee

Search and Discovery writer

Scaling marketplace search with AI
ai

Scaling marketplace search with AI

It is clear that for online businesses and especially for Marketplaces, content discovery can be especially challenging due to the ...

Bharat Guruprakash

Chief Product Officer

The changing face of digital merchandising
e-commerce

The changing face of digital merchandising

This 2-part feature dives into the transformational journey made by digital merchandising to drive positive ecommerce experiences. Part 1 ...

Reshma Iyer

Director of Product Marketing, Ecommerce

What’s a convolutional neural network and how is it used for image recognition in search?
ai

What’s a convolutional neural network and how is it used for image recognition in search?

A social media user is shown snapshots of people he may know based on face-recognition technology and asked if ...

Catherine Dee

Search and Discovery writer

What’s organizational knowledge and how can you make it accessible to the right people?
product

What’s organizational knowledge and how can you make it accessible to the right people?

How’s your company’s organizational knowledge holding up? In other words, if an employee were to leave, would they ...

Catherine Dee

Search and Discovery writer

Adding trending recommendations to your existing e-commerce store
engineering

Adding trending recommendations to your existing e-commerce store

Recommendations can make or break an online shopping experience. In a world full of endless choices and infinite scrolling, recommendations ...

Ashley Huynh

Looking for something?

Testing for Failure in a 99.999% Reliability World
facebookfacebooklinkedinlinkedintwittertwittermailmail

Even in a perfect world where everyone is doing test-driven development (TDD), even when everything is well planned and as a result, those plans succeed – Even in this world, things will fail. Bugs happen. Inevitably, there is always a little thingy that was forgotten. That’s this little thingy that this post is about.

In our years of building Algolia, we’ve had our share of learning experiences when it comes to building reliable infrastructure, especially when it comes to hardware and networks. As we strive to have a five 9s SLA, we need to have as many failovers as possible. Part of this failover is done in our API clients, where we implement automatic retries in case of TCP or DNS failures.

TCP is complex and can fail. The TCP stack of most programming languages is sound, but not all HTTP clients know how to handle failure correctly. That is why we need to thoroughly test how our API clients are behaving in case of network failures.

If something fails or is not fast enough, we want to make sure that another method is used. So, the most important factors, in our view, are timeouts. We need to be sure they are handled correctly by the HTTP clients we are using.

With our knowledge of TCP, we knew that the timeouts we wanted to enforce were:

  • Connection timeout: the time to make the initial connection, i.e. the time to initiate the TCP handshake
  • Read timeout: the time to wait to read data, i.e. the delay between 2 bytes sent by the server

For DNS, it’s a bit more complex. Most of the time, it’s not a connected protocol and uses UDP. We saw that it is handled very differently in each programming language, so we needed to make sure our API Clients were behaving in the same way whatever their programming language. Hence, we wanted to enforce only one timeout: the time to resolve a hostname.

Simulating network errors

With in mind what we wanted to test, how could we simulate network errors easily and in a way that is language agnostic?

First, connection timeout. This one is quite easy, as you only need a host that resolves to an IP that doesn’t answer. Some ranges of IPv4 are reserved, so you only need one host that resolves into  the private network range, we chose randomly 10.255.255.1.

Second, read timeout. For this one, we need a host that resolves to an IP that accepts connections, but never answer when we ask for data.

Third, the DNS timeout, which is a bit more tricky than the first two tests. To test for this condition, we need a host where its DNS resolution times out. So we created a new domain where the DNS resolution is handled by a server that timeouts. Ring a bell? It’s the same as the connection timeouts. The resolver of our domain is the same IP that the one for the connection timeout: 10.255.255.1.

With all of this we could test timeouts in every language possible.

Simulating user input

We are operating a public facing API, so anyone can send us a request. And a small part of those are invalid:

  • Invalid JSON
  • Bad UTF-8 characters
  • SQL injection, remote code, or attacks of the same kind

For this, there was already a lot of resources on Internet, so we used them.

For JSON, we use YAJL, so we were also pretty confident in the handling of JSON. For various reasons, we tried developing our own JSON parser, so we wanted to make sure it was handling bad and good JSON correctly. We stumbled upon this article and this test suite. We used it to test our JSON parser & YAJL. Funny thing, we discovered that YAJL accepts line feed (\f) as a valid whitespace character, where the JSON standard doesn’t.

UTF-8 is a complex encoding format, and it’s quite easy to generate a sequence of bytes that result in a bad UTF-8 character. For this, we aggregated multiple source of bad sequence so we could use them.

Last, but not least, we evaluated naughty strings. It’s strings that could be a security issue/flaw: https://github.com/minimaxir/big-list-of-naughty-strings.

So with little effort, we manage to add quite a few tests to ensure that we handle corner cases correctly.

Discovering Failure

For the previous failures, it was something we knew beforehand. So it was quite easy to know what to test and how to simulate errors. But what happens when the unexpected happens?

Let’s take an example. We have an internal application that reads logs that are in the following format: “key1=value1;key2=value2”. This format is quite straightforward. It’s a key/value separated by semicolons. So, there isn’t a ton of code needed to parse it. But this application is business critical and should handle incorrect logs in a proper way, aka not to crash.

To ensure it doesn’t crash, we can add some basic unit tests as well on some corner cases we thought, but there was probably a lot more that we didn’t think about.

One way to do this is to use property testing. It’s a way to test code where you let the computer generates the testing data. It comes from functional languages, where it works pretty well as all functions are pure and could be described by its inputs and outputs.

Property testing works when you describe properties on your code, you describe how to generate the data, and then you let the property test framework generates the data and it checks if those data validates the properties.

Let’s take a full example with our log parsing application. One property could be “I should not throw an exception if I receive an invalid log”.

So what is a log?

  • It’s a string
  • It’s a sequence of strings separated by semicolon
  • It’s a sequence of “key=value” separated by semicolon
  • We could then generate the key/values we expect, and so on

Then we run the testing framework, and it will test our application with the data that is constrained by what is a log. With this we managed to found some corner cases in our parsing of logs. One field was expecting a IP address but we didn’t check it was in the correct format, for example.

In Summary

Testing for failures is not a lot of work if you know what are the areas to look for. As long as you know it, you can find some good documentation of corner cases. For all the rest, with little effort, you can code property tests that will test your software in a new, and unexpected way.

About the author
Rémy-Christophe Schermesser

Staff Software Engineer

Recommended Articles

Powered byAlgolia Algolia Recommend

Introducing our new navigation
product

Craig Williams

Director of Product Design & Research

Our Post Mortem of the DNS DDoS which took place on Monday May 16th
product

Rémy-Christophe Schermesser

Staff Software Engineer

Algolia's top 10 tips to achieve highly relevant search results
product

Julien Lemoine

Co-founder & former CTO at Algolia