Introducing new developer-friendly pricing
Hey there, developers! At Algolia, we believe everyone should have the opportunity to bring a best-in-class search experience ...
VP of Product Growth
Hey there, developers! At Algolia, we believe everyone should have the opportunity to bring a best-in-class search experience ...
VP of Product Growth
Eye-catching mannequins. Bright, colorful signage. Soothing interior design. Exquisite product displays. In short, amazing store merchandising. For shoppers in ...
Search and Discovery writer
Ingesting data should be easy, but all too often, it can be anything but. Data can come in many different ...
Staff Product Manager, Data Connectivity
Everyday there are new messages in the market about what technology to buy, how to position your company against the ...
Chief Strategic Business Development Officer
Done any shopping on an ecommerce website lately? If so, you know a smooth online shopper experience is not optional ...
Sr. SEO Web Digital Marketing Manager
It’s hard to imagine having to think about Black Friday less than 4 months out from the previous one ...
Chief Strategic Business Development Officer
What happens if an online shopper arrives on your ecommerce site and: Your navigation provides no obvious or helpful direction ...
Search and Discovery writer
In part 1 of this blog-post series, we looked at app interface design obstacles in the mobile search experience ...
Sr. SEO Web Digital Marketing Manager
In part 1 of this series on mobile UX design, we talked about how designing a successful search user experience ...
Sr. SEO Web Digital Marketing Manager
Welcome to our three-part series on creating winning search UX design for your mobile app! This post identifies developer ...
Sr. SEO Web Digital Marketing Manager
National No Code Day falls on March 11th in the United States to encourage more people to build things online ...
Consulting powerhouse McKinsey is bullish on AI. Their forecasting estimates that AI could add around 16 percent to global GDP ...
Chief Revenue Officer at Algolia
How do you sell a product when your customers can’t assess it in person: pick it up, feel what ...
Search and Discovery writer
It is clear that for online businesses and especially for Marketplaces, content discovery can be especially challenging due to the ...
Chief Product Officer
This 2-part feature dives into the transformational journey made by digital merchandising to drive positive ecommerce experiences. Part 1 ...
Director of Product Marketing, Ecommerce
A social media user is shown snapshots of people he may know based on face-recognition technology and asked if ...
Search and Discovery writer
How’s your company’s organizational knowledge holding up? In other words, if an employee were to leave, would they ...
Search and Discovery writer
Recommendations can make or break an online shopping experience. In a world full of endless choices and infinite scrolling, recommendations ...
Jun 13th 2017 engineering
Even in a perfect world where everyone is doing test-driven development (TDD), even when everything is well planned and as a result, those plans succeed – Even in this world, things will fail. Bugs happen. Inevitably, there is always a little thingy that was forgotten. That’s this little thingy that this post is about.
In our years of building Algolia, we’ve had our share of learning experiences when it comes to building reliable infrastructure, especially when it comes to hardware and networks. As we strive to have a five 9s SLA, we need to have as many failovers as possible. Part of this failover is done in our API clients, where we implement automatic retries in case of TCP or DNS failures.
TCP is complex and can fail. The TCP stack of most programming languages is sound, but not all HTTP clients know how to handle failure correctly. That is why we need to thoroughly test how our API clients are behaving in case of network failures.
If something fails or is not fast enough, we want to make sure that another method is used. So, the most important factors, in our view, are timeouts. We need to be sure they are handled correctly by the HTTP clients we are using.
With our knowledge of TCP, we knew that the timeouts we wanted to enforce were:
For DNS, it’s a bit more complex. Most of the time, it’s not a connected protocol and uses UDP. We saw that it is handled very differently in each programming language, so we needed to make sure our API Clients were behaving in the same way whatever their programming language. Hence, we wanted to enforce only one timeout: the time to resolve a hostname.
With in mind what we wanted to test, how could we simulate network errors easily and in a way that is language agnostic?
First, connection timeout. This one is quite easy, as you only need a host that resolves to an IP that doesn’t answer. Some ranges of IPv4 are reserved, so you only need one host that resolves into the private network range, we chose randomly 10.255.255.1.
Second, read timeout. For this one, we need a host that resolves to an IP that accepts connections, but never answer when we ask for data.
Third, the DNS timeout, which is a bit more tricky than the first two tests. To test for this condition, we need a host where its DNS resolution times out. So we created a new domain where the DNS resolution is handled by a server that timeouts. Ring a bell? It’s the same as the connection timeouts. The resolver of our domain is the same IP that the one for the connection timeout: 10.255.255.1.
With all of this we could test timeouts in every language possible.
We are operating a public facing API, so anyone can send us a request. And a small part of those are invalid:
For this, there was already a lot of resources on Internet, so we used them.
For JSON, we use YAJL, so we were also pretty confident in the handling of JSON. For various reasons, we tried developing our own JSON parser, so we wanted to make sure it was handling bad and good JSON correctly. We stumbled upon this article and this test suite. We used it to test our JSON parser & YAJL. Funny thing, we discovered that YAJL accepts line feed (\f) as a valid whitespace character, where the JSON standard doesn’t.
UTF-8 is a complex encoding format, and it’s quite easy to generate a sequence of bytes that result in a bad UTF-8 character. For this, we aggregated multiple source of bad sequence so we could use them.
Last, but not least, we evaluated naughty strings. It’s strings that could be a security issue/flaw: https://github.com/minimaxir/big-list-of-naughty-strings.
So with little effort, we manage to add quite a few tests to ensure that we handle corner cases correctly.
For the previous failures, it was something we knew beforehand. So it was quite easy to know what to test and how to simulate errors. But what happens when the unexpected happens?
Let’s take an example. We have an internal application that reads logs that are in the following format: “key1=value1;key2=value2”. This format is quite straightforward. It’s a key/value separated by semicolons. So, there isn’t a ton of code needed to parse it. But this application is business critical and should handle incorrect logs in a proper way, aka not to crash.
To ensure it doesn’t crash, we can add some basic unit tests as well on some corner cases we thought, but there was probably a lot more that we didn’t think about.
One way to do this is to use property testing. It’s a way to test code where you let the computer generates the testing data. It comes from functional languages, where it works pretty well as all functions are pure and could be described by its inputs and outputs.
Property testing works when you describe properties on your code, you describe how to generate the data, and then you let the property test framework generates the data and it checks if those data validates the properties.
Let’s take a full example with our log parsing application. One property could be “I should not throw an exception if I receive an invalid log”.
So what is a log?
Then we run the testing framework, and it will test our application with the data that is constrained by what is a log. With this we managed to found some corner cases in our parsing of logs. One field was expecting a IP address but we didn’t check it was in the correct format, for example.
Testing for failures is not a lot of work if you know what are the areas to look for. As long as you know it, you can find some good documentation of corner cases. For all the rest, with little effort, you can code property tests that will test your software in a new, and unexpected way.
Staff Software Engineer
Powered by Algolia Recommend