🎉 Try the public beta of the new docs site at algolia.com/doc-beta! 🎉
Guides / Sending and managing data

Prepare your records for indexing

How you format and structure your attributes and records in your index will affect your search results.

Algolia records

An Algolia record is a collection of attributes where each attribute has a name and a value (a key-value pair).

Here is an example record with four different kinds of attributes (string, integer, array, and boolean):

1
2
3
4
5
6
7
8
9
{
  "title": "Blackberry and blueberry pie",
  "description": "A delicious pie recipe that combines blueberries and blackberries.",
  "image": "https://yourdomain.com/blackberry-blueberry-pie.jpg",
  "likes": 1128,
  "sales": 284,
  "categories": ["pie", "dessert", "sweet"],
  "gluten_free": false
}

Your records should only include information that helps with searching, showing results, sorting, and relevance. You can leave everything else out.

Attributes don’t have to follow a specific schema (pattern) - they can differ for each record.

Searchable attributes

All attributes are searchable by default, but, to make searching more relevant, you should set only some attributes as searchable. You can do this with the searchable attributes feature. You can also use this setting to rank your searchable attributes, making some more relevant than others.

Textual, descriptive attributes, such as summaries, brands, or colors, can be helpful as searchable attributes. For instance, in a recipe app, to help users search for “blueberry pie”, you need an attribute containing those words, such as a title attribute.

Structuring records

When adding data to your records, be selective. For example, if you’re working with a product line, you don’t need to send every piece of information about your products: only what serves the purposes of search. Include all the necessary information to find products, rank them, and display them on your website or app.

Building records involves:

  • Extracting valuable data
  • Reworking the data to remove unnecessary elements
  • Adding or computing extra information that improves the chances of finding the most relevant results.

Simplify your records

When making an Algolia index, make your records as simple as possible.

Each record should have enough information for users to find it quickly. When users search for something, Algolia returns records as results that match their search. Each record should contain all the information users need to find it and display its content.

Don’t worry about relational database principles, such as not repeating data or creating hierarchical structures with primary and foreign keys.

Rework your data

Think about what needs to happen if you want to create a movie UI for users to:

  • Search and view titles, synopsis, and distribution details
  • Display (but not search) images and country of release
  • Filter on genre or a range of dates
  • Rank based on review scores
  • You don’t care about technical information, such as how long the movie is.

Assuming your data comes from a relational database, with the information you need in different tables, you need to query the data from these tables. After fetching it, a record for one movie may look like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
[
  {
    "title": "Spirited Away",
    "synopsis": "During her family's move to the suburbs, a sullen 10-year-old girl wanders into a world ruled by gods, witches, and spirits, and where humans are changed into beasts.",
    "director": "Hayao Miyazaki",
    "cast": [
      {
        "name": "Rumi Hiiragi",
        "birth_date": "August 1, 1987",
        "birth_place": "Tokyo, Japan"
      },
      {
        "name": "Miyu Irino",
        "birth_date": "February 19, 1988",
        "birth_place": "Tokyo, Japan"
      },
      {
        "name": "Mari Natsuki",
        "birth_date": "May 2, 1952",
        "birth_place": "Tokyo, Japan"
      }
    ],
    "release_year": 2001,
    "country": "Japan",
    "genres": [
      "Animation",
      "Adventure",
      "Family",
      "Fantasy",
      "Mystery"
    ],
    "runtime": 125,
    "aspect_ratio": "1.85:1",
    "content_rating": "PG",
    "review_scores": {
      "imdb": 8.6,
      "rotten_tomatoes": {
        "critics": 97,
        "audience": 96
      }
    },
    "images": [
      "https://example.com/spirited-away/image1.jpg",
      "https://example.com/spirited-away/image2.jpg"
    ]
  }
]

The example contains several kinds of content, some useful for a search experience, some that aren’t, and others that require reformatting. For example, you don’t need to keep runtime or aspect_ratio. While they’re helpful in other contexts, they have little value when searching, filtering, ranking, or displaying search results.

While the names of the voice actors in the cast attribute are helpful, you don’t need their birth date and place. Therefore, you can safely remove them and just keep the names. This process removes noise and saves room in records for more valuable data.

1
2
3
4
5
6
7
{
  "cast": [
    "Rumi Hiiragi",
    "Miyu Irino",
    "Mari Natsuki"
  ]
}

Attributes for searching

Attributes for searching are the ones that contain the terms that your users look for. If you want to search for a movie by title, plot, genre, or cast, you need attributes that contain these terms. In the preceding example, such attributes are title, synopsis, director, cast, and genres.

Algolia lets you define in which specific attributes to search, using the searchableAttributes parameter. By default, the engine searches within the entire record, but you want to adjust this: this is better for performance and lets you remove noise. Don’t search attributes that aren’t textually relevant or might generate false positives, like images, release_year, review_scores, or country. For example, when searching for “japan”, English-speaking users most likely want to find movies that either have the term in the title or take place in Japan rather than Japanese movies.

You can therefore set title, synopsis, director, cast, and genres as searchableAttributes and leave out the rest for displaying, filtering, and custom ranking.

You can add extra data to improve the discoverability of your record. For example, some users may look for a movie by its original title or translation in their language. Unless the translations are in the record, searching for these terms would return no results, so it’s a good idea to retrieve them and add them to your objects. You can fetch them from your database if you have them or a third-party source such as an API or a website.

1
2
3
4
5
6
7
8
9
{
  "display_title": "Spirited Away",
  "original_title": "千と千尋の神隠し",
  "alternative_title": [
    "Le voyage de Chihiro",
    "El viaje de Chihiro",
    "Chihiros Reise ins Zauberland"
  ]
}

Attributes for displaying

Display attributes are those attributes that can be useful to show in the search results. They can include:

  • Titles
  • Descriptions
  • Attributes used for filtering and custom ranking, such as the number of likes or categories
  • Images. To display images in your results, you need an image URL attribute in your records. This way, Algolia can return them in search results for you to display on the frontend.

Some display attributes, such as title and description, can also be searchable. Others, such as image or likes, shouldn’t be set as searchable.

Attributes for filtering

Sometimes, you might want users to find a specific subset (category) of your records. For example, they may want to find all movies by director Hayao Miyazaki, find new adventure movies to watch or look for the best motion pictures of the past year.

You can do this by setting some attributes as filters to narrow down search results. The following are examples of attributes that you can filter:

  • Booleans (like whether an item is public)
  • Lists (categories, tags)
  • Numeric attributes (price, rounded rating)
  • Normalized text (colors, types, or enumerated types).

For example, you could use director, cast, country, content_rating, and genres and display them as refinement lists in your search experience, and release_year to display a range slider. Declare them with the attributesForFaceting parameter.

Filterable attributes can be anything, but you should normalize your data to ensure consistency. For example, if you have attribute genres with the term “Animation” in one record and “Animated picture” in another, these would result in two different facet values.

A good rule of thumb is to add attributes based on how users want to fine-tune their search. If you have a movie reviews website, users likely want to refine on review score or popularity.

Custom ranking attributes

Custom ranking makes Algolia’s search results more relevant for your users by including your business metrics. It’s a good idea to start thinking about such metrics when fetching your data and structuring your records.

For example, if a user looks for “james bond”, all James Bond movies would match equally.

Without anything else to break the tie, Algolia falls back on the objectID in alphanumeric order, which isn’t helpful. A better way to break ties is to compare meaningful information. For movies, you could use the review_scores attribute. However, in the preceding example, you have several scores, you may want to compute them into a global one and use them in custom ranking. The computed attribute would look like this:

1
2
3
{
  "computed_score": 201.6
}

Custom ranking attributes must be either numeric or boolean.

Advanced data formatting

Algolia provides a vast collection of settings to help with relevance, and many of these work in combination with how you format your content. For example, whether to use one or several attributes for a single piece of information, including long or short descriptions (or both), repeat the same words in the title and description, and how to use custom ranking attributes.

Did you find this page helpful?