On this page
Grouping records usually refers to the process of combining multiple records into a single result, or consolidating many similar records into two or three results. This kind of de-duplication or aggregation of results has three primary use cases:
- Item Variations, where any item with variations is displayed only once. A t-shirt that comes in five colors should only appear once in the results, with all five color options displayed somewhere in the description.
- Large Records, where you first break up large record into smaller sub-records, and then during the search, if several of these sub-records match, you display the most relevant one.
- Grouping by attribute, where you group records depending on the value of one of their attributes.
Introducing Algolia’s Distinct feature
Algolia’s distinct feature solves all of these scenarios. Distinct is a term we borrowed from the SQL world. Algolia is not meant to be used as a traditional database - it’s a search engine. Still, it’s sometimes useful to borrow some concepts from the database world. Two of those concepts are
group by; both can be achieved using the distinct feature.
Distinct is implemented by the
Handling Item Variations
Anytime you have items that come in variations - for example, in different versions or colors - you most likely have a situation that requires grouping.
This can happen with t-shirts of different colors, or with computers of different memory and power. Less obvious use cases are TV series with different seasons or episodes, or car parts that are compatible with different models and years of cars.
These examples all share the same problem, which is mostly a UI issue: how to display the variety of a single product without hiding other products.
Consider what happens if you don’t do anything. The most relevant product shows up first in the results, and just below that, all of its variations appear, like the same t-shirt in red, blue, green, and so on. So you’ll get a flood of a single product on the first page, pushing all other products off the screen. Another unwanted effect could be that to have only two or three products scattered in seemingly random order - a few green t-shirts there, another yellow one here, etc. Neither of these results is meaningful. You need some way to avoid this flood or randomness.
Distinct helps you remove the multiplicity and show only a single hit per item.
Handling Large Records
Large records arise from having records with lots of text, or long HTML pages, or similar content that can be better managed when broken down into smaller chunks of data.
You can split large documents into smaller chunks of text, such as per paragraphs or sentences. For example, in an index of blog articles, instead of having a single record for each article with a large
content attribute (the full content of the article), you’d use one record per paragraph. These paragraph records could then be sorted by an
order attribute to reconstruct the original content.
Splitting large records into smaller ones not only improves search performance, it also improves search relevance. Imagine if you had an entire book’s content stored in a single
content attribute. If the first word of a search query matched a paragraph on page 10, and the second word matched a paragraph on page 90, the whole record would be returned as a match. However, it would likely not be relevant.
Grouping by Attribute
When you have one-to-many relationships in your records, but you want to display data selectively, you need to flatten your records and repeat data. For example, imagine you’re developing a job search with companies and job openings. Several companies may have many job openings, but you only want to show the most relevant ones per company.