Icon indexing white

Formatting Data

Last updated 01 August 2017

Accepted datatypes

Records in Algolia are modeled with JSON, and are easy to configure with semi-structured data. Both requests to the Algolia API are sent and recieved with JSON objects.

Algolia is schemaless, and can index data attributes that have the following format:

  • string "foo"
  • integer / float 12 or 12.34
  • boolean true
  • nested objects { "sub": { "a1": "v1", "a2": "v2" } }
  • array (of string, integers, floats, booleans, nested objects) ["foo", "bar"]

Unique identifier - ObjectID

Every object is uniquely identified by an objectID.

If you don’t provide one, Algolia will generate one automatically. However, it will be easier to remove or update records if you have stored a unique identifier in the objectID attribute.

If your objects have unique IDs and you would like to use them to make future updates easier, you can specify the objectID in the records you push to Algolia. The value you provide for objectIDs can be an integer or a string.

Because the objectID attribute is used as a unique identifier for your objects, it is treated specially by Algolia:

  • It can be searched by declaring it in searchableAttributes;
  • It can be used as a facet filter.
  • It cannot be highlighted nor snippeted. If objectID is declared in attributesToHighlight or attributesToSnippet, it will be ignored.
  • It cannot be excluded from the results. If objectID is declared in unretrievableAttributes or omitted from attributesToRetrieve, it will still be returned.
  • It cannot be faceted. If objectID is declared in attributesForFaceting, it will be ignored. (Faceting on a unique identifier makes little sense anyway, since every facet count would be equal to one.)

Formatting Recommendations

There are a few extra cases to consider as data objects become more complex, along with requirements from the Algolia engine on record size, non standard types, and unique identifiers.

Dates

Date attributes should be formatted as Unix Timestamps (ex. 1435735848) if you want to filter, or sort by date. By default the Algolia engine doesn’t interpret strings following the ISO date format, so you must convert your dates into numeric values.

Size Limit

Algolia limits the size of a record to 10kb for performance reasons. If an object is larger than the limit, Algolia will return the error message Record is too big. If your record is much larger than 10Kb, Algolia has techniques to break up a larger record into smaller records and de-duplicate the results using distinct. For more information, read over the distinct concept.

Example Record

{
  "objectID": 42,             // record identifier
  "title": "Breaking Bad",    // string attribute
  "episodes": [               // array of strings attribute
    "Crazy Handful of Nothin'",
    "Gray Matter"
  ],
  "like_count": 978,          // integer attribute
  "avg_rating": 1.23456,      // float attribute
  "air_date": 1356846157,     // date as a timestamp attribute
  "featured": true,           // boolean attribute
  "actors": [                 // nested objects attribute
    {
      "name": "Walter White",
      "portrayed_by": "Bryan Cranston"
    },
    {
      "name": "Skyler White",
      "portrayed_by": "Anna Gunn"
    }
  ]
}

Indexing of html tags

In order to keep an optimal relevancy, we made the choice to exclude HTML/XML tags, and their attributes, from data being indexed and searchable.

In the following example, users will be able to search for any word in the description except the link tag itself. This means that the following record will not be returned for a query of “href”, “target” or “_blank”.

{
  "name": "Myth #9",
  "description": "In-house <a href="http://…" target="_blank" rel="noopener">experts</a> are essential to get search right"
}

Sanitizing these HTML attributes for security risks is the best practice for both displaying on the front end and indexing with Algolia.

What’s Next

Continue building your Algolia knowledge with these concepts:

© Algolia - Privacy Policy