Icon indexing white

Structuring Data

Last updated 07 November 2017

Organizing Your Indices

Even though Algolia is schemaless, we highly recommend that each index contain only one dataset. In nearly all situations, if two items you want to search on go into different tables in a database, they’ll go into a different index.

For example, if you are building an autocomplete dropdown that searches into movies and actors, you’ll need two indices: one for movies and one for actors. This approach allows each index to have its own settings and ranking strategy. For example, movies might be ranked by rating, while actors would be ranked according to a different popularity metric.

However, it should be noted that if you require results to be returned intermingled (that is, ranked in conjunction with each other) you will need to store all the respective records in one index.

Indexing Relations

You can technically index the relations of objects with an array of values, as we support indexing of arrays in your JSON. That said, it is usually better to index each element of the array in a separate record, as this will provide the best relevance.

For example, consider the case of a search where users can query by book title or chapter title.

If you index a book with its chapter information, this record might look like this:

{
  "book_name": "Harry Potter and the Philosopher's Stone",
  "popularity": 1000,
  "chapter_titles": [
    "The Boy Who Lived",
    "The Vanishing Glass",
    ...
  ]
}

However, if you split this record into book records and chapter records, you data set will look this:

[
  {
    "book_name": "Harry Potter and the Philosopher's Stone",
    "popularity": 1000
  },
  {
    "book_of": "Harry Potter and the Philosopher's Stone",
    "chapter_name": "The Boy Who Lived",
    "popularity": 900
  },
  {
    "book_of": "Harry Potter and the Philosopher's Stone",
    "chapter_name": "The Vanishing Glass",
    "popularity": 800
  }
]

By re-structuring your data this way, you can take advantage of a few Algolia settings:

  • By placing book_name higher in the searchable attributes list than chapter_name or book_of, you can ensure the parent book will rank higher than any of its chapters.
  • By breaking up the chapters into their own records, you can add more granular popularity attributes to each record to ensure the most relevant chapters are surfaced, assuming popularity is added as a custom ranking attribute.

Indexing Long Documents

In order to index long documents with Algolia, you will need to split up the document into smaller records. Record size is limited to 10kb for performance reasons, meaning each new “chunk” should realistically only be a paragraph or two. It’s also important to note that this approach will result in some redundancy of data.

Consider the case of a long document with a title, description, and 5 paragraphs. If you choose to split along the paragraph line, this document will coincide with 5 records in Algolia. Here’s a quick example of what this might look like:

[
  {
    "title": "Lorem Ipsum",
    "description": "Donec molestie nisl vel sem ultrices laoreet",
    "content:": "Suspendisse eget dictum neque, id dapibus ligula. Nullam commodo a nunc sit amet tincidunt."
    "popularity": 1000
  },
  {
    "title": "Lorem Ipsum",
    "description": "Donec molestie nisl vel sem ultrices laoreet",
    "content:": "Morbi mattis malesuada lacus in interdum. Phasellus tempor vel dui eu sodales."
    "popularity": 1000
  },
  ...
]

In order to ensure that users only see the best matching record when searching, we will need to leverage the distinct parameter. This will allow us to de-duplicate based off a common “key” (in this case, the document’s title). For more information on utilizing distinct, read over the distinct concept.

Additionally, we should set our searchable attributes such that title is most important, followed by description, and then content. This will ensure that a match in the title is ranked higher than a match within the content of a record. However, because the records are split, if a match does occur in the content we can easily display the matching segment highlighted in a snippet.

What’s next

Continue building your Algolia knowledge with these concepts:

© Algolia - Privacy Policy