Search by Algolia
What is online retail merchandising? An introduction
e-commerce

What is online retail merchandising? An introduction

Done any shopping on an ecommerce website lately? If so, you know a smooth online shopper experience is not optional ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

5 considerations for Black Friday 2023 readiness
e-commerce

5 considerations for Black Friday 2023 readiness

It’s hard to imagine having to think about Black Friday less than 4 months out from the previous one ...

Piyush Patel

Chief Strategic Business Development Officer

How to increase your sales and ROI with optimized ecommerce merchandising
e-commerce

How to increase your sales and ROI with optimized ecommerce merchandising

What happens if an online shopper arrives on your ecommerce site and: Your navigation provides no obvious or helpful direction ...

Catherine Dee

Search and Discovery writer

Mobile search UX best practices, part 3: Optimizing display of search results
ux

Mobile search UX best practices, part 3: Optimizing display of search results

In part 1 of this blog-post series, we looked at app interface design obstacles in the mobile search experience ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Mobile search UX best practices, part 2: Streamlining search functionality
ux

Mobile search UX best practices, part 2: Streamlining search functionality

In part 1 of this series on mobile UX design, we talked about how designing a successful search user experience ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Mobile search UX best practices, part 1: Understanding the challenges
ux

Mobile search UX best practices, part 1: Understanding the challenges

Welcome to our three-part series on creating winning search UX design for your mobile app! This post identifies developer ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Teaching English with Zapier and Algolia
engineering

Teaching English with Zapier and Algolia

National No Code Day falls on March 11th in the United States to encourage more people to build things online ...

Alita Leite da Silva

How AI search enables ecommerce companies to boost revenue and cut costs
ai

How AI search enables ecommerce companies to boost revenue and cut costs

Consulting powerhouse McKinsey is bullish on AI. Their forecasting estimates that AI could add around 16 percent to global GDP ...

Michelle Adams

Chief Revenue Officer at Algolia

What is digital product merchandising?
e-commerce

What is digital product merchandising?

How do you sell a product when your customers can’t assess it in person: pick it up, feel what ...

Catherine Dee

Search and Discovery writer

Scaling marketplace search with AI
ai

Scaling marketplace search with AI

It is clear that for online businesses and especially for Marketplaces, content discovery can be especially challenging due to the ...

Bharat Guruprakash

Chief Product Officer

The changing face of digital merchandising
e-commerce

The changing face of digital merchandising

This 2-part feature dives into the transformational journey made by digital merchandising to drive positive ecommerce experiences. Part 1 ...

Reshma Iyer

Director of Product Marketing, Ecommerce

What’s a convolutional neural network and how is it used for image recognition in search?
ai

What’s a convolutional neural network and how is it used for image recognition in search?

A social media user is shown snapshots of people he may know based on face-recognition technology and asked if ...

Catherine Dee

Search and Discovery writer

What’s organizational knowledge and how can you make it accessible to the right people?
product

What’s organizational knowledge and how can you make it accessible to the right people?

How’s your company’s organizational knowledge holding up? In other words, if an employee were to leave, would they ...

Catherine Dee

Search and Discovery writer

Adding trending recommendations to your existing e-commerce store
engineering

Adding trending recommendations to your existing e-commerce store

Recommendations can make or break an online shopping experience. In a world full of endless choices and infinite scrolling, recommendations ...

Ashley Huynh

Ecommerce trends for 2023: Personalization
e-commerce

Ecommerce trends for 2023: Personalization

Algolia sponsored the 2023 Ecommerce Site Search Trends report which was produced and written by Coleman Parkes Research. The report ...

Piyush Patel

Chief Strategic Business Development Officer

10 ways to know it’s fake AI search
ai

10 ways to know it’s fake AI search

You think your search engine really is powered by AI? Well maybe it is… or maybe not.  Here’s a ...

Michelle Adams

Chief Revenue Officer at Algolia

Cosine similarity: what is it and how does it enable effective (and profitable) recommendations?
ai

Cosine similarity: what is it and how does it enable effective (and profitable) recommendations?

You looked at this scarf twice; need matching mittens? How about an expensive down vest? You watched this goofy flick ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is cognitive search, and what could it mean for your business?
ai

What is cognitive search, and what could it mean for your business?

“I can’t find it.”  Sadly, this conclusion is often still part of the modern enterprise search experience. But ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Looking for something?

Salt Incident: May 3rd 2020 Retrospective and Update
facebookfacebooklinkedinlinkedintwittertwittermailmail
Summary & Key Takeaways

  • On May 3rd, 2020, Algolia’s infrastructure experienced an attack due to the salt configuration management vulnerability CVE-2020-11651. Through this vulnerability, two types of malware code were able to get into Algolia’s configuration manager — one designed to mine crypto-currencies and another one to act as a backdoor server.  
  • Given the nature of the malware, while there was not any data breach and no data was collected, altered, destroyed or damaged, on May 3rd less than 2% of Algolia’s search service customers experienced service disruption for more than 5 minutes and less than 1% for more than 10 minutes.
  • As of 17:33 UTC May 3rd, 2020 , all of Algolia’s search services across its 9,000+ customers were up and running as normal.
  • As an outcome from this experience, Algolia has reviewed and updated its security protocols, measures, and monitoring practices to further protect and strengthen its services.
  • Algolia holds the trust of its customers, and the security and performance of its services paramount. As such, it is providing the below detailed recount of the incident, along with an FAQ, to its customers and the community. 

 

You may have heard about the salt configuration management vulnerability (CVE-2020-11651 and CVE-2020-11652), impacting several large-scale organizations. The vulnerability involves the communication protocol of the salt master, and specifically its authenticated protocol, which had a major flaw, allowing a bypass of sanity checks, and the calling of sensitive internal functions that could allow misappropriation of communication keys, and continue even further with unauthenticated commands.

We have indeed been hit by an attack exploiting this very vulnerability during the night of May 3rd, 2020.

We at Algolia have always been candid and transparent about problems. We are doing this public post-mortem with the goal to provide full and detailed information about this incident to our users, but also to improve ourselves in the long run.

Waking up to the incident

It all started on Sunday, May 3, at 3:12am Paris time, when the phone of one of our core engineers suddenly rang. Multiple alerts were firing at the same time because the API was no longer available for a number of customers. The scale of the incident was large enough to justify waking up our infrastructure team to assist, and two additional engineers quickly joined to help investigate the issue in the middle of the night. 

Servers were continuing to fire alerts, shutting down one by one, without a clear cause, and remote access to these servers was also impossible. Large-scale network outage was suspected at the beginning, and one of our providers, OVH, quickly assisted in the investigation, ruling out this hypothesis. At the same time, we found suspicious commands while auditing our logs.

The culprit: configuration management

We were able to quickly determine that our configuration manager had been the victim of an attack, propagating malware commands to a number of servers in our Europe clusters. Part of our infrastructure was now running not just our code.

Sometimes overlooked, configuration management is a critical part of an infrastructure, especially in a large cloud. Being able to deploy packages to be installed, setting up services and maintaining them on thousands of servers is a must-have. But it can become an Achilles’ heel, as it involves a central server that gets privileged access to a whole lot of servers. One mistake can be amplified to painful extents.

What was impacted

More than five hundred servers were impacted, most of them temporarily losing indexing service, but some of them also losing search capabilities. Thanks to the way that we designed our architecture, this didn’t significantly impact our customers as additional servers stayed healthy and took over the search traffic.

While recovering the servers one by one, our main concern was to accurately evaluate what was the exact scale of the attack. And the first question is always: was any data breached? Analyzing the payloads executed by the malware, we concluded that the only goal of the attack was to mine crypto-currencies, and not to collect, alter, destroy or damage data. We could have been less fortunate, and this is a lesson we are not going to forget any time soon.

In addition, a portion of our monitoring network was overloaded by the detected failures, yet the status page was showing that everything is green. This was confusing and misleading. We are working on fixing it, and we’ll retrospectively update the status indicators to reflect the issues we detected.

Here is the downtime a number of our users suffered, in terms of indexing, but also sometimes on the search service:

  • 15 clusters out of over 700 (~2%) were impacted by a search downtime longer than 5 minutes.
  • 6 clusters (less than 1%) were impacted by a search downtime longer than 10 minutes.

How we responded: reclaiming the infrastructure

We first started by shutting down the configuration manager involved in the incident across all of our infrastructure, keeping files for later forensic analysis. Then, we teamed up with our different providers, started rebooting all of the impacted servers one by one, and investigated their state. We identified that two malwares had been injected— one to mine crypto-currencies and another as a backdoor server. We started killing all malwares, restored files back to their original state, and then built a plan to reinstall all the impacted servers one by one.

As part of our standard protocol, public announcements were regularly updated on https://status.algolia.com/, and we informed customers reaching out to our support team of the situation.

Seven hours after the first alert was triggered, we rebooted the last server after removing all injected malware, and started working on the configuration management side. A dozen people were involved in recovering and working tirelessly to rebuild damaged services.

How did we get here?

The impacted component was an older tool, and has been running fine, but we were planning to rework it in the following two quarters. We have now put in place a temporary fix which secures the environment and we’re actively working on reworking the system sooner than planned.

What we did so far:

  • We’ve secured the impacted SaltStack service by updating it and adding additional IP filtering, allowing only our servers to connect to it.
  • We’ve reinstalled the SaltStack servers from scratch.
  • We’re rotating security keys necessary for communication with the SaltStack service, both on our application servers and on SaltStack servers.
  • We’re rotating secrets on all of our servers.
  • We’re restricting access keys of our control plane services to specific IPs of servers where they’re expected to be accessed and used from.

What we plan to do over the coming days:

  • We’re going to review all of our control plane services and add an additional layer of access control to ensure that a single control failure will not lead to exposure of the service.
  • As an added security measure, we’re going to reinstall all of the servers in our infrastructure that had contact with the SaltStack to ensure all servers are clean of any unexpected changes.
  • We’ll establish a process to monitor and review Common Vulnerabilities and Exposures  (CVEs) even during the holidays and weekends to avoid combinations like a May 1st holiday, followed by a weekend, and shorten the reaction time between CVEs being discovered and patches applied.
  • If the SLA of your application has been impacted, we’ll provide an appropriate service credit in accordance with our applicable Service Level Agreement.

The security and up time of our services are two of our core priorities.  Therefore, there is no acceptable excuse for this outage. You can count on us to learn from this, and improve our setup and infrastructure so this never happens again.

____________

 

FAQs

  • What was the impact to customers and their users’ search experience during this attack?
    • Less than 1% of customers experienced degraded search experience or search outage.
  • How do I know if my site was impacted during this attack?
    • If you’re not aware of any issues, that means you were most likely not impacted and all the safety mechanisms worked. If you want to see the status of your application during that time, you can see it in the Dashboard, section Monitoring and tab Status.
  • You mentioned that part of the infrastructure was not running just Algolia’s code? What exactly was the other code that was running and what did it do?
    • The code was trying to get as much CPU as possible to mine crypto-currencies and prevent the server from restarting.
  • What percentage of Algolia’s servers were impacted from the attack?
    • Less than 20%.
  • How was the search downtime only approximately ten minutes if the entire process through rebooting the last server took seven hours?
    • Thanks to our architecture there is a high level of redundancy in our application clusters and for the vast majority of the cases the whole cluster was not impacted. We designed this architecture to sustain datacenter outages and this situation was very similar. Some of the servers didn’t successfully reboot on their own and the team was then working on these isolated cases for the next hours.

____________

References

https://nvd.nist.gov/vuln/detail/CVE-2020-11651

https://nvd.nist.gov/vuln/detail/CVE-2020-11652

https://thehackernews.com/2020/05/saltstack-rce-exploit.html

About the author
Julien Lemoine

Co-founder & former CTO at Algolia

githublinkedintwitter

Recommended Articles

Powered byAlgolia Algolia Recommend

Redesigning our Docs – Part 6 – The processes and logistics of a large scale project
algolia

Maxime Locqueville

DX Engineering Manager

Redesigning Our Docs - Part 3 - The UX/UI Phase
ux

Nicolas Meuzard

Product Designer

Algolia's Checklist for Selecting a Critical SaaS Service
engineering

Julien Lemoine

Co-founder & CTO at Algolia