Concepts / Sending and managing data / Data sanitization
Sep. 26, 2019

Data Sanitization

Algolia Does Not Sanitize Your Data

Algolia accepts any data, without any alteration. Same goes with the response, Algolia returns all data in your index as is. It therefore saves and returns HTML and XML tags and their properties.

That said, Algolia’s search algorithm ignores HTML and XML. Users can’t search tag content.

Let’s take a look at an example. Algolia has no problem saving a record that contains the HTML tag <strong>. However, because Algolia strips tags during search, a search for the word “strong” won’t find the following record.

1
2
3
{
  "description": "She is amazingly <strong>powerful</strong>, deeply visionary."
}

Sanitizing the query response

Some characters are systematically removed (not escaped) from the API’s response:

  • Control characters (U+0000 to U+001F)
  • Delete (U+007F)

Security

Clean you indices

Since Algolia does not sanitize your data and returns it as is, you need to manage this yourself. Otherwise, you run the risk of an XSS attack.

To avoid it, you have two options:

  1. Escape or strip potentially dangerous characters before indexing
  2. Escape or strip them before displaying results

Clean your user search input

You also need to handle user search input. Any HTML or code they may enter in the search bar exposes you to an XSS attack because Algolia sends back the query in its response. Therefore, you want to escape or strip tags and code before displaying them.

Did you find this page helpful?