Guides / Sending and managing data / Format and structure your data

Reducing Record Size

Algolia limits the size of a record for performance reasons. The limits depend on your plan. If a record is larger than the threshold, Algolia returns the error message Record is too big. Besides upgrading your plan, there are techniques that can help you reformat and break your records into smaller ones, depending on your use case.

If you need to index long documents, you should split your data into several records rather than indexing a full document in one record. Please visit the how-to guide on indexing long documents.

If your records have lots of attributes, there are probably some you don’t use, or could optimize by reformatting how you structure the data.

Removing unused attributes

You might not need to index every attribute from your data source into Algolia. You should rework your raw data before indexing records to Algolia and remove what you don’t need. For example, imagine you’re developing a website to display tweets on specific topics. Twitter may send you a lot of metadata, most of which might be useless in the context of search.

Before removing attributes

For example, an excerpt of the API response for a single tweet might look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[
  {
    "text": "Good morning #LaraconEU ! Swing by the Algolia booth for some vintage video games, awesome stickers and lightning fast search 😎",
    "truncated": false,
    "in_reply_to_user_id": null,
    "in_reply_to_status_id": null,
    "pictures": true,
    "pictures_urls": [
      "https://pbs.twimg.com/media/Dl1UftbX4AAF1Fj.jpg",
      "https://pbs.twimg.com/media/Dl1Um6OXsAAa1JS.jpg"
    ],
    "source": "<a href=\"http://twitter.com/\" rel=\"nofollow\">Twitter for Web</a>",
    "in_reply_to_screen_name": null,
    "in_reply_to_status_id_str": null,
    "retweeted": false,
    "retweet_count": 10,
    "like_count": 24,
    "place": null,
    "geolocated": false
  }
]

Indexing everything would grow your record’s size, while you might not need everything to build your search experience. Furthermore, you could reuse the same attributes for different purposes. Instead of making a test on the pictures attribute to show a gallery conditionally, you could include the pictures_urls attribute only when the array isn’t empty, and perform the same test on the presence of that attribute.

After removing attributes

You can cherry-pick only the information you need for searching, displaying, ranking, and leaving everything else out. After transforming the data, a record could look like this:

1
2
3
4
5
6
7
8
9
10
11
[
  {
    "text": "Good morning #LaraconEU ! Swing by the Algolia booth for some vintage video games, awesome stickers, and lightning-fast search 😎",
    "pictures_urls": [
      "https://pbs.twimg.com/media/Dl1UftbX4AAF1Fj.jpg",
      "https://pbs.twimg.com/media/Dl1Um6OXsAAa1JS.jpg"
    ],
    "retweet_count": 10,
    "like_count": 24
  }
]

Before reworking the data, you had information that would bloat your record without helping with either searching, displaying, or ranking. After selecting only the necessary data, you’ve reduced the number of attributes and the record size without hurting search quality.

Did you find this page helpful?