Guides / Sending and managing data / Format and structure your data

Reducing Record Size

Algolia limits the size of a record for performance reasons. The limits depend on your plan. If a record is larger than the limit, Algolia returns the error message Record is too big. Besides upgrading your plan, we recommend a few techniques that can help you reformat and break up your records into smaller ones, depending on your use case.

If you need to index long documents, we usually recommend splitting your data into several records rather than try and index a full document in one index. Please visit our how-to guide on indexing long documents.

If your records have many attributes, there are probably many you don’t use, or could optimize by reformatting how you structure the data.

Modifying the data: Removing unused attributes

You might not need to index every attribute from your data source into Algolia. We strongly encourage you to rework your raw data before indexing records to Algolia and eliminate what you don’t need. For example, imagine we’re developing a website to display tweets on specific topics. Twitter may send you a lot of metadata, most of which might be useless in the context of search.

Before

For example, an excerpt of the API response for a single tweet might look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[
  {
    "text": "Good morning #LaraconEU ! Swing by the Algolia booth for some vintage video games, awesome stickers and lightning fast search 😎",
    "truncated": false,
    "in_reply_to_user_id": null,
    "in_reply_to_status_id": null,
    "pictures": true,
    "pictures_urls": [
      "https://pbs.twimg.com/media/Dl1UftbX4AAF1Fj.jpg",
      "https://pbs.twimg.com/media/Dl1Um6OXsAAa1JS.jpg"
    ],
    "source": "<a href=\"http://twitter.com/\" rel=\"nofollow\">Twitter for Web</a>",
    "in_reply_to_screen_name": null,
    "in_reply_to_status_id_str": null,
    "retweeted": false,
    "retweet_count": 10,
    "like_count": 24,
    "place": null,
    "geolocated": false,
    ...
  }
]

Indexing everything would grow your record’s size, while you might not need everything to build your search experience. Furthermore, you could reuse the same attributes for different purposes: instead of making a test on the pictures attribute to show a gallery conditionally, you could decide only to add the pictures_urls when the array isn’t empty and perform the same test on the presence of that attribute.

After

We recommend you cherry pick only the information you need for searching, displaying, and ranking, and leave everything else out. After transforming the data, a record could look like this:

1
2
3
4
5
6
7
8
9
10
11
[
  {
    "text": "Good morning #LaraconEU ! Swing by the Algolia booth for some vintage video games, awesome stickers and lightning fast search 😎",
    "pictures_urls": [
      "https://pbs.twimg.com/media/Dl1UftbX4AAF1Fj.jpg",
      "https://pbs.twimg.com/media/Dl1Um6OXsAAa1JS.jpg"
    ],
    "retweet_count": 10,
    "like_count": 24
  }
]

Before reworking the data, we had much information that would bloat our record without helping with either searching, displaying, or ranking. After selecting only the necessary data, we’ve significantly reduced the number of attributes (and consequently the record size) without hurting the quality of our search.

Did you find this page helpful?