Concepts / Sending and managing data / Prepare your data
Jan. 07, 2019

Prepare your data

Key Terms

Index is the data source for all of your searches.
An index is a collection of Records.

Record represents the single items that you search on; they are the data counterpart to the results you see on the screen. Depending on the use case, a record can be:

  • a product, film, song, actor, book.

A record is composed of Attributes.
Attributes describe the record. Some examples:

  • title, name, description, color, size, style, film, actors, music genre.

Fetching and reworking your data for Algolia

The Data Workflow:

  1. Fetch data from your data source (usually your database).
  2. Transform it into Records and Attributes
  3. Send it to an Algolia index using one of Algolia’s API Clients or the Dashboard.
  4. The index is hosted on Algolia’s servers.

Steps 1, 2, and 3 are done on your end.

1. Fetch data from your data source

Whether your data is in a database, a collection of XML files, spreadsheets, or any other format - this doesn’t concern us. What you need to do is extract data from your data source(s) and format it in a way that Algolia recognizes. Concretely, you need to create records with attributes.

You don’t need to extract everything. The process should be selective about what goes in the record, gathering only information that’s useful for building a search experience.

2. Transform your extracted data

You need to reformat your extract into an Algolia-compatible format - JSON records. The goal, however, is not only to be compatible. It’s to properly structure your data.

Formatting and structuring your data are two of the most important tools in creating great search and relevance. Therefore, in addition to turning your extract into JSON, you also need to refine the extract by reworking its content, adding new attributes, creating filters, restructuring record relationships, and more.

3. Push your records to an Algolia Index

Finally, you need to push your records into an Algolia Index, using one of our API clients. We cover this on our Send and Update Your Data guide.

This is all you need to start searching into your data.

To get started, you can use the dashboard, which allows you to paste in an example of your data. You can also write a script to push your data using our API. The script will run on your own computer/server, not on Algolia’s. You can write the script in any of the 10 languages of our API clients. Check out our Quick Start Guide.

Attributes - What to Put in your Record

An Algolia record is composed of key/value pairs called Attributes, which can change from record to record. Essentially, you want your records to contain any information that facilitates search, display, filtering, or relevance. Otherwise, you can leave it out.

Here is an example record of all four kinds of attributes.

1
2
3
4
5
6
7
8
9
10
{
  "title": "Blueberry Pie",
  "description": "A Recipe for Blueberry lovers.",
  "photo": "https://www.yourphotos.com/blueberry-pie.jpg",
  "likes": 1000,
  "sales": 200,
  "category": "pie",
  "gluten_free": false,
  ...
}

Algolia needs four categories of Attributes

1. Attributes for searching

If you want to search for “blueberry pie recipe”, you need attributes that contain those words, such as title and description. Other searchable attributes are titles, brands, colors, summary texts - essentially anything that can be used to find a record.

All attributes are searchable by default, which enables search to work right from the start. However, for better relevance, you want to be more selective by setting only some attributes as searchable. You can do this with the searchable attributes feature. You can also use this setting to prioritize your search attributes, making some more relevant than others.

2. Attributes for displaying

If you want to display photos in your results, you need an attribute that contains their URLs, so that Algolia can return them in its results. Display attributes include anything that can be useful to see in the results.Note that displayable attributes can also be searchable, like name and description.

3. Attributes for filtering

If you want to search for only a subset of records based on a category, like only pie recipes, or just gluten-free desserts, you can set up matching attributes as filters. This can include color, brand, categories, genres, and so on.

4. Attributes for customizing ranking

If you want the most popular recipes to appear first in your results, you can add business-metric attributes such as most likes, best rated, highest sales.

Custom Ranking strengthens and individualizes Algolia’s default ranking formula. Ranking is a way to order your records by relevance. You can improve upon Algolia’s default ranking by including your own business metrics into the mix. To do this, you use the custom ranking feature.

Records - Keeping it Simple

Simplifying your records

The general rule for creating a searchable index is to simplify your record structure as much as possible. Each record should contain enough information to be discoverable on its own. You don’t need to follow strict relational database principles, such as not repeating data or creating hierarchical table structures with primary and foreign keys.

You can think of it like this: the Algolia engine returns records, so each record in your index should contain enough information to be found and to allow a full display of its content.

Example

A dataset of books can help illustrate this. You can have one record per book, which contains everything about the book, including all its chapters. This is Ok. But a search for a common word like “boat” will find too many books, most not about boats.

If you want to get better, more relevant matches, you’ll break up the records into individual chapters, creating one record per chapter. This way, you can search for books on boats with far more relevance by searching through their chapters.

There are many considerations to take into account when structuring your records. Here are some common use cases.

The Index

You store all your records in an Algolia Index. An index is a collection of records and is created as soon as you push data in it. You can create several indices that will contains different sets of records. All indices reside on Algolia’s servers.

In the initial stages, it’s all about preparing your data - deciding what to send to Algolia and how to format your records. However, once you’ve prepared and pushed your data, it’s all about organizing indices - how many to have, and how to configure each one. We speak about index configuration in the context of managing results. Here, we’ll speak about the organization of your data into one or more indices.

You can put all your records into one index or spread them across several indices. How you organize your indices corresponds to how you want to search and display your records.

Record formatting

Example of records

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[
  {
    "objectID": 42,               // record identifier
    "title": "Breaking Bad",      // string attribute (searchable)
    "episodes": [                 // array of strings attribute (searchable)
      "Crazy Handful of Nothin'",
      "Gray Matter"
    ],
    "like_count": 978,            // integer attribute (ranking)
    "avg_rating": 1.23456,        // float attribute (ranking)
    "air_date": 1356846157,       // date as timestamp
    "featured": true,             // boolean attribute
    "actors": [                   // nested objects attribute (searchable)
      {
        "name": "Walter White",
        "portrayed_by": "Bryan Cranston"
      },
      {
        "name": "Skyler White",
        "portrayed_by": "Anna Gunn"
      }
    ],
    "media_category": "tv series",                 // string attribute for filtering
    "subject_category": ["drugs", "divorced dad"], // array of filters
    "_tags": "tv series, drugs"
  },
  {
    "objectID": 42,
    "title": "Mean Streets",
    "like_count": 1201,
    "avg_rating": 2.346,
    "featured": true,
    "director": "Martin Scorsese",
    "media_category": "film",
    "subject_category": ["brooklyn", "heist"],
    "_tags": "film, gangs"
  },
  ...
]

Record specifications

Did you find this page helpful?