Search by Algolia
How to increase your ecommerce conversion rate in 2024
e-commerce

How to increase your ecommerce conversion rate in 2024

2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How does a vector database work? A quick tutorial
ai

How does a vector database work? A quick tutorial

What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...

Catherine Dee

Search and Discovery writer

Removing outliers for A/B search tests
engineering

Removing outliers for A/B search tests

How do you measure the success of a new feature? How do you test the impact? There are different ways ...

Christopher Hawke

Senior Software Engineer

Easily integrate Algolia into native apps with FlutterFlow
engineering

Easily integrate Algolia into native apps with FlutterFlow

Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...

Chuck Meyer

Sr. Developer Relations Engineer

Algolia's search propels 1,000s of retailers to Black Friday success
e-commerce

Algolia's search propels 1,000s of retailers to Black Friday success

In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

Generative AI’s impact on the ecommerce industry
ai

Generative AI’s impact on the ecommerce industry

When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What’s the average ecommerce conversion rate and how does yours compare?
e-commerce

What’s the average ecommerce conversion rate and how does yours compare?

Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What are AI chatbots, how do they work, and how have they impacted ecommerce?
ai

What are AI chatbots, how do they work, and how have they impacted ecommerce?

“Hello, how can I help you today?”  This has to be the most tired, but nevertheless tried-and-true ...

Catherine Dee

Search and Discovery writer

Algolia named a leader in IDC MarketScape
algolia

Algolia named a leader in IDC MarketScape

We are proud to announce that Algolia was named a leader in the IDC Marketscape in the Worldwide General-Purpose ...

John Stewart

VP Corporate Marketing

Mastering the channel shift: How leading distributors provide excellent online buying experiences
e-commerce

Mastering the channel shift: How leading distributors provide excellent online buying experiences

Twice a year, B2B Online brings together America’s leading manufacturers and distributors to uncover learnings and industry trends. This ...

Jack Moberger

Director, Sales Enablement & B2B Practice Leader

Large language models (LLMs) vs generative AI: what’s the difference?
ai

Large language models (LLMs) vs generative AI: what’s the difference?

Generative AI and large language models (LLMs). These two cutting-edge AI technologies sound like totally different, incomparable things. One ...

Catherine Dee

Search and Discovery writer

What is generative AI and how does it work?
ai

What is generative AI and how does it work?

ChatGPT, Bing, Bard, YouChat, DALL-E, Jasper…chances are good you’re leveraging some version of generative artificial intelligence on ...

Catherine Dee

Search and Discovery writer

Feature Spotlight: Query Suggestions
product

Feature Spotlight: Query Suggestions

Your users are spoiled. They’re used to Google’s refined and convenient search interface, so they have high expectations ...

Jaden Baptista

Technical Writer

What does it take to build and train a large language model? An introduction
ai

What does it take to build and train a large language model? An introduction

Imagine if, as your final exam for a computer science class, you had to create a real-world large language ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

The pros and cons of AI language models
ai

The pros and cons of AI language models

What do you think of the OpenAI ChatGPT app and AI language models? There’s lots going on: GPT-3 ...

Catherine Dee

Search and Discovery writer

How AI is transforming merchandising from reactive to proactive
e-commerce

How AI is transforming merchandising from reactive to proactive

In the fast-paced and dynamic realm of digital merchandising, being reactive to customer trends has been the norm. In ...

Lorna Rivera

Staff User Researcher

Top examples of some of the best large language models out there
ai

Top examples of some of the best large language models out there

You’re at a dinner party when the conversation takes a computer-science-y turn. Have you tried ChatGPT? What ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What are large language models?
ai

What are large language models?

It’s the era of Big Data, and super-sized language models are the latest stars. When it comes to ...

Catherine Dee

Search and Discovery writer

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail
Summary and key takeaways

  • Two root certification authorities expired on May 30, 2020.
  • Some of our customers experienced service outages for up to 1.5 hours (if they had outdated OpenSSL libraries), and others up to 3 hours (if they also had outdated certificate stores).
  • The issue has been fully mitigated, and the service availability was restored for everyone.
    Although related to OpenSSL, HTTPS and PKI certificates, this was not a security incident.

On May 30, 2020, 10:48 UTC, we experienced a rare situation in the Public Key Infrastructure of the Internet when two of the root certification authorities expired, one cross-signed by the other.

In theory, this isn’t anything unusual. Certificates expire all the time (normally in a year or even 90 days if you use Let’s Encrypt), and certification authorities expire once in many years. But on May 30, the expiring certification authority exposed an underlying issue which made our API unavailable for some of our customers.

On May 30, a minute after 10:48 UTC when the certification authorities expired, our Site Reliability Engineering (SRE) team got notified that there was a certificate problem with our service. This was an unexpected message because we carefully check that our certificates are valid and don’t expire any time soon. A quick verification in the browser also confirmed that everything seemed normal and that the API was responding correctly. However, a second alert from a different service notified the SRE team, claiming again that the certificate expired. Another verification in the browser again confirmed that the service was working and responding. For some reason, the Pingdom monitoring service saw the certificate as expired, but everything was working for us in the browsers.

Looking at the traffic graphs on our infrastructure, we noticed that there was a decrease in the number of API calls, but they were nowhere close to zero. When things don’t work completely, it is often easier to identify what does not work rather than in situations when some things work and some things don’t.

The direction of the investigation quickly changed when we tested the impacted services from the command line of our laptops with macOS. Suddenly, a simple curl to the domain was failing saying the certificate was not valid. We were on to something, and the Qualys SSL Labs scanner showed many interesting things:

  1. Our certificate was valid.
  2. The certificate of our intermediate certification authority Sectigo RSA Organization Validation Secure Server CA was valid.
  3. In Path 1, the root certificate USERTrust RSA Certification Authority was valid, and the full chain was valid.
  4. In Path 2, the certificate of our root certification authority USERTrust RSA Certification Authority expired on May 30, 10:48 UTC.
  5. The certificate of root certification authority AddTrust External CA Root cross-signing the certificate of USERTrust RSA expired on May 30, 10:48 UTC.

This was an interesting situation. There was a valid path to the USERTrust RSA Certification Authority, and there was also an expired path. The browser was able to find the valid chain, but the curl was not able to find it. The Qualys SSL Labs test was showing the system was working and the certificate configuration was working fine.

We updated the status page with the information we had, dedicated people to respond to emails coming to our support mailbox and started to work on mitigations of what we thought was the source of the problem.

In our testing environment, we verified that removing the expired certificates from the certificate chain served by our servers was a good approach. The servers were still available from the browser and now even curl in the command line was able to verify the certificate chain and let the requests through. This was a promising solution worth deploying to production and so we went ahead and started deploying.

Once the change hit the first production servers we saw the traffic levels recover to what we would expect at that time of the day. With every additional server getting the now shorter version of the certificate, the situation was getting better and better until the moment when we thought everything was done and we could finish the incident. However, there was still a small group of customers telling us they were unable to verify the certificate and connect to the API.

With this is mind we returned to the Qualys test of the new configuration:

As seen above, the test indicated that there was one valid trusted path, as well as one new trusted path that was not visible before—this time not finishing at AddTrust External CA root, but AAA Certificate Services. Where did this new path come from? As it turned out, the USERTrust RSA Certification Authority certificate exists in 3 versions:

  • self-signed root certification authority version created in 2010
  • cross-signed by AddTrust External CA root version created in 2000
  • cross-signed by AAA Certificate Services version created in 2019

The situation at this time was that the vast majority of our customers were now recognizing the self-signed root version from their certificate stores, we couldn’t use the AddTrust and there was only a AAA Certificate Services to use. It was unclear why the self-signed version wasn’t recognized everywhere since it had existed in the commonly available certificate stores for some time. After a quick look at the AAA Certificate Services certificate showed it was created in 2004, there was a strong likelihood that it was in more certificate stores and had been in place for a bit longer.

Our SRE team started to deploy the second change to our public certificate trying to restore the service for the remaining customers and shortly after deployment confirming with them they can reach the service again. The incident was now finally over and the service was restored for all customers. But why did some of our customers lose access in the first place when even the Qualys scanner says the certificate is valid?

The certificate was indeed valid and the whole certificate chain was generated somewhat correctly by the Comodo/Sectigo certification authority (expiring any certificate during the weekend is not a great practice and neither is expiring a certification authority because its leaf certificates, but technically it’s not incorrect.) What was not envisioned was how client libraries are going to handle this situation, primarily OpenSSL which powers the vast majority of HTTPS on Earth. During the analysis of the incident, we landed on an interesting OpenSSL bug from 2014 which says:

Don’t use expired certificates if possible.
When looking for the issuer of a certificate, if current candidate is expired, continue looking. Only return an expired certificate if no valid certificates are found.

This means that before this change was implemented, whenever OpenSSL detects an invalid certificate in the chain it declares the certificate as invalid and refuses the connection. After this change was implemented, OpenSSL skips the expired certificate and correctly continues looking for additional certificates that can prove the certificate is valid. This tiny change adapts to the nature of the certificate chain—a single certification authority can be signed by multiple certification authorities, some of them still valid and some of them not anymore. Everything looks great, so why the impact then?
Digging further, this change to OpenSSL is only part of OpenSSL 1.1.1 and is not part of OpenSSL 1.0.x. This, for example, means that all versions of Ubuntu 16.04 and older, Debian 9 and older, CentOS 7 and older or RedHat 7 and older are impacted, but all of these are still generally supported versions, at least from the security point of view. On our macOS laptops, we use LibreSSL 2.x, and LibreSSL has a similar mention in the release notes of LibreSSL 3.2.0:

* Use non-expired certificates first when building a certificate chain.

We’ll now have to wait for new LibreSSL to appear in new versions of macOS.

But why did the browsers verify the certificates correctly? Browsers ship with their own SSL/TLS libraries and their own ways of verifying PKI. Chrome ships with BoringSSL and Firefox with NSS, independently from the SSL/TLS libraries of the underlying operating system, not having the same bug and being updated much more often.

And here we are, finally having the full picture of what had happened and why only back end implementations of some customers were impacted and why there were some systems with outdated certificate stores that needed a much older root certificate.

What is next? We’re reaching out to impacted customers and explaining what happened, we’re improving our certificate checking tool to verify expiration in the full chain, not just our leaf certificate. Last but not least, we’ll include OpenSSL in our annual Open Source donations and will financially support the work of the OpenSSL team.

—————–

FAQ

  • Was the service unavailability caused by an issue on the Algolia servers?
    No, the service continued working and was unavailable only for outdated systems.
  • What should I do to avoid a similar situation happening in the future with Algolia or other services?
    Update your OpenSSL to at least version 1.1.1 and LibreSSL to at least 3.2.0. Also update your certificate stores, on Linux often as “ca-certificates” package.
  • Did Algolia’s certificate expire?
    No, Algolia’s certificate didn’t expire and is still valid. What expired was one out of 3 versions of certification authority signing our certificate.
  • Was any other provider impacted or was the issue specific to Algolia?
    There were unfortunately other providers impacted: Heroku, Stripe, kernel.org, Datadog, Gandi.net, and many others.
  • What was the impact to customers?
    We detected approximately 10% of our application clusters were impacted for about 1.5 hours. After this period, there was a single digit number of customers that was impacted for up to 3 hours. During the incident no search queries coming from the browsers were impacted.
About the author
Adam Surak

Director of Infrastructure & Security @ Algolia

twitter

Start building for free

Create a full-featured search experience in no time.

Get started
Start building for free

Recommended Articles

Powered byAlgolia Algolia Recommend

Salt Incident: May 3rd 2020 Retrospective and Update
engineering

Julien Lemoine

Co-founder & former CTO at Algolia

When Solid State Drives are not that solid
engineering

Adam Surak

Director of Infrastructure & Security @ Algolia

The Challenging Migration from Heroku to Google Kubernetes Engine
engineering

Adrien Joly

Senior Software Engineer