Guides / Managing results / Refine results

Results deduplication with distinct

Deduplicating records in the results lets you show only one or a few representatives of records that are variants of each other, instead of showing all variations individually. For example, if you sell t-shirts and every t-shirt comes in 10 different colors, you might show just one variant in your search results or category page. You can still include the different color options in the product details. This lets users discover more of your t-shirts, instead of all variations of one t-shirt. For an example, see Handle item variations.

You can use the same technique to index long documents, where you split a page into smaller records, for example, one record per section. When searching, instead of showing all matching records from one page, you show only the top-ranked one. This lets matches from other pages also be included in the search results.

Distinct

With the attributeForDistinct setting, you define an attribute for the deduplication. For example, if you set the url attribute as attributeForDistinct, then Algolia treats all records with the same URL as variants of each other.

You can then control how many variants should be included in the search results with the distinct parameter.

If you want to define groups of records, based on queries, or matching filters, see Smart Groups.

Flatten your records

To effectively use distinct and attributeForDistinct, you need to flatten your records. The following example illustrates the difference between nested and flat record structures and their implications for search.

Before – nested

With a nested approach, your records might look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[
  {
    "company": "Twilio",
    "job_openings": [
      "Staff Software Engineer - Cloud Platform",
      "Lead Frontend Engineer",
      "Senior Data Engineer",
      "Senior Software Engineer, Developer Experience"
    ]
  },
  {
    "company": "Algolia",
    "job_openings": [
      "Full-Stack Software Engineer",
      "Frontend Engineer",
      "Open Source Software Engineer (JavaScript)",
      "Senior Software Engineer - Core API",
      "Senior Systems Engineer - SRE"
    ]
  }
]

With this structure, whenever you match any opening, the entire record with all openings for one company is retrieved. Each record represents a company, which is usually not how users search for job openings. If you want to show the best match per company, this way of structuring your records doesn’t work.

After – flat

Instead of nesting, create one record for each opening, and repeat the company each time:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[
  {
    "company": "Twilio",
    "job_opening": "Staff Software Engineer - Cloud Platform"
  },
  {
    "company": "Twilio",
    "job_opening": "Lead Frontend Engineer"
  },
  {
    "company": "Twilio",
    "job_opening": "Senior Data Engineer"
  },
  {
    "company": "Twilio",
    "job_opening": "Senior Software Engineer, Developer Experience"
  },
  {
    "company": "Algolia",
    "job_opening": "Full-Stack Software Engineer"
  },
  {
    "company": "Algolia",
    "job_opening": "Frontend Engineer"
  },
  {
    "company": "Algolia",
    "job_opening": "Open Source Software Engineer (JavaScript)"
  },
  {
    "company": "Algolia",
    "job_opening": "Senior Software Engineer - Core API"
  },
  {
    "company": "Algolia",
    "job_opening": "Senior Systems Engineer - SRE"
  }
]

Each record now represents a job opening instead of a company. When a user searches for an opening, for example, “engineer”, they get the best matching results (for example, defined by your custom ranking).

If a company has many openings, you might want to limit the number of shown results per company, so that users can explore openings from more companies.

Configure distinct in the dashboard

To select an attribute for deduplication:

  1. Go to the Algolia dashboard and select your Algolia application.
  2. On the left sidebar, select Algolia Search Search.
  3. Select your Algolia index:

    Select your Algolia application and index

  4. On the Configuration tab, select Search behavior > Deduplication and Grouping.
  5. In the Distinct menu, select “true”.
  6. In the Attribute for Distinct menu, select “company”.

Now when searching for a job opening, only one job opening per company is included in the search results.

Configure distinct with the API

Use the setSettings method with the attributeForDistinct parameter and the company attribute. Optional: Set distinct to true using the same method. This deduplicates all following search requests.

1
2
3
4
var response = await client.SetSettingsAsync(
  "ALGOLIA_INDEX_NAME",
  new IndexSettings { AttributeForDistinct = "company", Distinct = new Distinct(true) }
);

After setting attributeForDistinct, you can set distinct as an additional parameter to the search methods:

1
2
3
4
var response = await client.SearchSingleIndexAsync<Hit>(
  "ALGOLIA_INDEX_NAME",
  new SearchParams(new SearchParamsObject { Distinct = new Distinct(true) })
);

This deduplicates only the current search.

Handle item variations

For example, consider a shop that sells t-shirts and sweatshirts in different designs and colors. To make them searchable, you can structure your records in these ways:

  • One record for each product: product-level model
  • One record for each color variant: variant-level model

For more information about each model, see Structure ecommerce product records.

One record per variant

The inventory has:

  • 2 t-shirt designs: A and B
  • 2 sweatshirt designs: C and D

Each design comes in different colors.

Two t-shirt and two sweatshirt designs in several colors

You can represent this as one record for each color variant of each design. Each record describes the type, design, color, and the associated thumbnail:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[
  {
    "type": "t-shirt",
    "design": "B",
    "color": "blue",
    "thumbnail_url": "tshirt-B-blue.png"
  },
  {
    "type": "sweatshirt",
    "design": "C",
    "color": "red",
    "thumbnail_url": "sweatshirt-C-red.png"
  },
]

It might be a good idea to add all possible color variations to each record. This lets you show all variants for an item in your UI, for example, as color swatches below the product thumbnail.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[
  {
    "type": "t-shirt",
    "design": "B",
    "color": "blue",
    "thumbnail_url": "tshirt-B-blue.png",
    "color_variants": ["orange", "teal", "yellow", "red", "green"]
  },
  {
    "type": "t-shirt",
    "design": "B",
    "color": "orange",
    "thumbnail_url": "tshirt-B-orange.png",
    "color_variants": ["blue", "teal", "yellow", "red", "green"]
  },
]

Having one record per variation lets you add granular custom ranking attributes, such as number_of_sales. Again, you can also use distinct to limit the number of variants shown per design. This lets users explore more of your designs instead of all variations of a single design.

Configure distinct in the dashboard

  1. Go to the Algolia dashboard and select your Algolia application.
  2. On the left sidebar, select Algolia Search Search.
  3. Select your Algolia index:

    Select your Algolia application and index

  4. On the Configuration tab, select Relevance essentials > Searchable attributes.
  5. Click Add a Searchable Attribute and add design, type, and color attributes.
  6. Go to Search behavior > Deduplication and Grouping.
  7. In the Distinct menu, select “true”.
  8. In the Attribute for Distinct menu, select “design”.

Now, the results only include one color per design. To control which one, add an attribute with business metrics, such as number_of_sales and use it as custom ranking.

 One color for each design

Configure distinct with the API

Before deduplicating items, restrict what should be searchable. For example, searching within the thumbnail_url attribute might lead to irrelevant noise. The attribute color_variants is added for UI purposes only and could lead to false positive matches. Therefore, only set design, type, and color as searchableAttributes.

1
2
3
4
5
6
7
var response = await client.SetSettingsAsync(
  "ALGOLIA_INDEX_NAME",
  new IndexSettings
  {
    SearchableAttributes = new List<string> { "design", "type", "color" },
  }
);

Next, set design as attributeForDistinct. Optional: set distinct to true. This deduplicates all following search requests.

1
2
3
4
var response = await client.SetSettingsAsync(
  "ALGOLIA_INDEX_NAME",
  new IndexSettings { AttributeForDistinct = "design", Distinct = new Distinct(true) }
);

After setting attributeForDistinct, you can set distinct as an additional parameter to the search methods:

1
2
3
4
var response = await client.SearchSingleIndexAsync<Hit>(
  "ALGOLIA_INDEX_NAME",
  new SearchParams(new SearchParamsObject { Distinct = new Distinct(true) })
);

This deduplicates only the current search.

Did you find this page helpful?