Results deduplication with distinct
On this page
Deduplicating records in the results lets you show only one or a few representatives of records that are variants of each other, instead of showing all variations individually. For example, if you sell t-shirts and every t-shirt comes in 10 different colors, you might show just one variant in your search results or category page. You can still include the different color options in the product details. This lets users discover more of your t-shirts, instead of all variations of one t-shirt. For an example, see Handle item variations.
You can use the same technique to index long documents, where you split a page into smaller records, for example, one record per section. When searching, instead of showing all matching records from one page, you show only the top-ranked one. This lets matches from other pages also be included in the search results.
Distinct
With the attributeForDistinct
setting,
you define an attribute for the deduplication.
For example, if you set the url
attribute as attributeForDistinct
,
then Algolia treats all records with the same URL as variants of each other.
You can then control how many variants should be included in the search results
with the distinct
parameter.
If you want to define groups of records, based on queries, or matching filters, see Smart Groups.
Flatten your records
To effectively use distinct
and attributeForDistinct
,
you need to flatten your records.
The following example illustrates the difference between nested and flat record structures
and their implications for search.
Before – nested
With a nested approach, your records might look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[
{
"company": "Twilio",
"job_openings": [
"Staff Software Engineer - Cloud Platform",
"Lead Frontend Engineer",
"Senior Data Engineer",
"Senior Software Engineer, Developer Experience"
]
},
{
"company": "Algolia",
"job_openings": [
"Full-Stack Software Engineer",
"Frontend Engineer",
"Open Source Software Engineer (JavaScript)",
"Senior Software Engineer - Core API",
"Senior Systems Engineer - SRE"
]
}
]
With this structure, whenever you match any opening, the entire record with all openings for one company is retrieved. Each record represents a company, which is usually not how users search for job openings. If you want to show the best match per company, this way of structuring your records doesn’t work.
After – flat
Instead of nesting, create one record for each opening, and repeat the company each time:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[
{
"company": "Twilio",
"job_opening": "Staff Software Engineer - Cloud Platform"
},
{
"company": "Twilio",
"job_opening": "Lead Frontend Engineer"
},
{
"company": "Twilio",
"job_opening": "Senior Data Engineer"
},
{
"company": "Twilio",
"job_opening": "Senior Software Engineer, Developer Experience"
},
{
"company": "Algolia",
"job_opening": "Full-Stack Software Engineer"
},
{
"company": "Algolia",
"job_opening": "Frontend Engineer"
},
{
"company": "Algolia",
"job_opening": "Open Source Software Engineer (JavaScript)"
},
{
"company": "Algolia",
"job_opening": "Senior Software Engineer - Core API"
},
{
"company": "Algolia",
"job_opening": "Senior Systems Engineer - SRE"
}
]
Each record now represents a job opening instead of a company. When a user searches for an opening, for example, “engineer”, they get the best matching results (for example, defined by your custom ranking).
If a company has many openings, you might want to limit the number of shown results per company, so that users can explore openings from more companies.
Configure distinct in the dashboard
To select an attribute for deduplication:
- Go to the Algolia dashboard and select your Algolia application.
- On the left sidebar, select Search.
-
Select your Algolia index:
- On the Configuration tab, select Search behavior > Deduplication and Grouping.
- In the Distinct menu, select “true”.
- In the Attribute for Distinct menu, select “company”.
Now when searching for a job opening, only one job opening per company is included in the search results.
Configure distinct with the API
Use the setSettings
method with the attributeForDistinct
parameter
and the company
attribute.
Optional: Set distinct
to true
using the same method.
This deduplicates all following search requests.
1
2
3
4
var response = await client.SetSettingsAsync(
"ALGOLIA_INDEX_NAME",
new IndexSettings { AttributeForDistinct = "company", Distinct = new Distinct(true) }
);
After setting attributeForDistinct
, you can set distinct
as an additional parameter to the search
methods:
1
2
3
4
var response = await client.SearchSingleIndexAsync<Hit>(
"ALGOLIA_INDEX_NAME",
new SearchParams(new SearchParamsObject { Distinct = new Distinct(true) })
);
This deduplicates only the current search.
Handle item variations
For example, consider a shop that sells t-shirts and sweatshirts in different designs and colors. To make them searchable, you can structure your records in these ways:
- One record for each product: product-level model
- One record for each color variant: variant-level model
For more information about each model, see Structure ecommerce product records.
One record per variant
The inventory has:
- 2 t-shirt designs: A and B
- 2 sweatshirt designs: C and D
Each design comes in different colors.
You can represent this as one record for each color variant of each design. Each record describes the type, design, color, and the associated thumbnail:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[
{
"type": "t-shirt",
"design": "B",
"color": "blue",
"thumbnail_url": "tshirt-B-blue.png"
},
{
"type": "sweatshirt",
"design": "C",
"color": "red",
"thumbnail_url": "sweatshirt-C-red.png"
},
]
It might be a good idea to add all possible color variations to each record. This lets you show all variants for an item in your UI, for example, as color swatches below the product thumbnail.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[
{
"type": "t-shirt",
"design": "B",
"color": "blue",
"thumbnail_url": "tshirt-B-blue.png",
"color_variants": ["orange", "teal", "yellow", "red", "green"]
},
{
"type": "t-shirt",
"design": "B",
"color": "orange",
"thumbnail_url": "tshirt-B-orange.png",
"color_variants": ["blue", "teal", "yellow", "red", "green"]
},
]
Having one record per variation lets you add granular custom ranking attributes,
such as number_of_sales
.
Again, you can also use distinct
to limit the number of variants shown per design.
This lets users explore more of your designs instead of all variations of a single design.
Configure distinct in the dashboard
- Go to the Algolia dashboard and select your Algolia application.
- On the left sidebar, select Search.
-
Select your Algolia index:
- On the Configuration tab, select Relevance essentials > Searchable attributes.
- Click Add a Searchable Attribute and add
design
,type
, andcolor
attributes. - Go to Search behavior > Deduplication and Grouping.
- In the Distinct menu, select “true”.
- In the Attribute for Distinct menu, select “design”.
Now, the results only include one color per design.
To control which one, add an attribute with business metrics,
such as number_of_sales
and use it as custom ranking.
Configure distinct with the API
Before deduplicating items,
restrict what should be searchable.
For example, searching within the thumbnail_url
attribute might lead to irrelevant noise.
The attribute color_variants
is added for UI purposes only and could lead to false positive matches.
Therefore, only set design
, type
, and color
as searchableAttributes
.
1
2
3
4
5
6
7
var response = await client.SetSettingsAsync(
"ALGOLIA_INDEX_NAME",
new IndexSettings
{
SearchableAttributes = new List<string> { "design", "type", "color" },
}
);
Next, set design
as attributeForDistinct
.
Optional: set distinct
to true
.
This deduplicates all following search requests.
1
2
3
4
var response = await client.SetSettingsAsync(
"ALGOLIA_INDEX_NAME",
new IndexSettings { AttributeForDistinct = "design", Distinct = new Distinct(true) }
);
After setting attributeForDistinct
,
you can set distinct
as an additional parameter to the search methods:
1
2
3
4
var response = await client.SearchSingleIndexAsync<Hit>(
"ALGOLIA_INDEX_NAME",
new SearchParams(new SearchParamsObject { Distinct = new Distinct(true) })
);
This deduplicates only the current search.