Guides / Algolia AI

Query Categorization is a beta feature according to the Algolia Terms of Service (“Beta Services”).

Query Categorization lets you predict the categories to which a search query belongs. To do this, it uses an AI model to create categories from the records in your index. Every index can use its own category hierarchy, depending on your needs, such as product categories for an ecommerce website or genres for a movie app. For example, in an online grocery shop, the query banana can be part of the category Food > Vegetables and fruits.

With Algolia’s Query Categorization feature, you have:

  • A dedicated section in the Algolia dashboard to set up the AI model and explore its predictions
  • Automatic filtering and boosting on predicted categories without writing extra code to help increase the relevance of your user’s results
  • Analytics grouped by predicted category to learn how the categories perform and to detect underperforming queries
  • Access to category predictions at query time (with the Search API) so that you can provide a Search and Discovery experience customized for your users.

How to set up Query Categorization

To set up Query Categorization, you must send click and conversion events and then configure the AI model. After setup, check the model output and the generated category tree.

You can also use the Query Categorization predictions in your frontend at query time.

Send click and conversion events

To use Query Categorization, you must send click or conversion events. Algolia AI uses this data to train its model to predict categories.

The Query Categorization model is retrained automatically every 24 hours. It always uses events from the last 30 days, meaning it’s based on a sliding window of the most recent analytics data.

Configure the model

To set up Query Categorization, you need to provide the facets on which the model makes predictions. The facets must represent the hierarchy of your categories (up to five levels deep).

Once you’ve entered your facets, click Save to start the model-building process. Depending on the number of categories and traffic, this can take a few minutes to half an hour.

Supported hierarchical facets formats

  • Assuming your records are structured like this:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    {
      "name": "banana",
      "description": "...",
      "price": 3.45,
      "hierarchicalCategories":
      {
        "lvl0": "Food",
        "lvl1": "Fruits"
      }
    }
    

    Set hierarchicalCategories.lvl0 as the first level used by the model and hierarchicalCategories.lvl1 as the second level.

  • If your records are structured like this:

    1
    2
    3
    4
    5
    6
    7
    
    {
      "name": "banana",
      "description": "...",
      "price": 3.45,
      "group": "Food",
      "section": "Fruits"
    }
    

    Set group as the first level used by the model and section as the second level.

Suppose your records belong to several categories simultaneously, and you use arrays to represent each level of depth. In that case, the model expects shared prefixes (for example, use Food as the first level facet value and Food > Fruits as the second level).

The model doesn’t support records structured with only one attribute for all depth levels. For example:

1
2
3
4
5
6
{
  "name": "banana",
  "description": "...",
  "price": 3.45,
  "categories": ["Food", "Food > Fruits"]
}

Model output

After configuration, the AI model will:

  1. Use the provided categories to build a “categories tree” (a hierarchical representation of your categories) based on the different facet values of items in your index.
  2. Extract the most likely categories for the most popular queries (by using the click and conversion events you sent to Algolia)
  3. Train an AI model to predict the categories associated with a query. Each prediction includes a confidence score from 0 to 100 and a type.

The type can be:

  • narrow for queries tied to only one item in your categories tree: a category that’s a “leaf” in the tree (that is, without further sub-categories).
  • broad for queries tied to a category that’s at a higher level in the categories tree (a category with sub-categories)
  • ambiguous for queries tied to several unrelated categories
  • none for queries for which the model couldn’t predict any category

Managing the categories tree

After training the model, you can look at the generated categories tree by selecting the Categories Tree View tab in Algolia’s dashboard (Search > Configure > Query Categorization). Here, you can browse the hierarchy and check that it was correctly generated.

You can choose to ban (exclude) some categories from the predictions. For example, you should exclude values that aren’t actual categories, like “Black Friday” or “On sale”. Removing these values from the categories tree increases the model’s performance.

Retrieve the Query Categorization predictions at query time

You can use the predictions directly in your frontend at query time to implement, for example:

  • Query expansion. When you have limited results for a query, you can expand the results set with more items from the same category
  • Disambiguation. When the query is broad, you can suggest different categories to help narrow down the search
  • A tailored experience. Provide a custom experience based on the user intent by having a specific layout for some categories

Query Categorization populates your search results with the predicted categories for the search query. The query used for prediction is normalized by the engine, not the raw query.

Enabling Query Categorization at query time

To retrieve Query Categorization results at query time, you need to activate this option from the dashboard or in query parameters.

  • The Query Categorization section of Algolia’s dashboard (Search > Configure > Query Categorization).
  • In query parameters as a JSON object or a URL encoded string, for example: extensions%3D%7B%22queryCategorization%22%3A%7B%22enableCategoriesRetrieval%22%3Atrue%7D%7D

    Query Categorization parameters are grouped under the extensions field:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    {
      /* Other standard query parameters... */
      "extensions": {
        "queryCategorization": {
          "enableCategoriesRetrieval": true
          /* Other options to control Automatic Filtering and Boosting are available */
        }
      }
    }
    

Search response format

The search response has the usual format, with predictions in the attribute extensions.queryCategorization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  /* Regular search answer (like hits) */
  "extensions": {
    "queryCategorization": {
        "normalizedQuery": "banana",
        "count": 2,
        "type": "narrow",
        "categories": [
            {
                "probability": 0.87,
                "hierarchyPath": [
                    {
                        "facetName": "category.lvl0",
                        "facetValue": "Food",
                        "depth": 0
                    },
                    {
                        "facetName": "category.lvl1",
                        "facetValue": "Fruits",
                        "depth": 1
                    }
                ]
            }
        ]
        }
    }
}

On rare occasions, extensions.queryCategorization can be an empty object for queries that the Query Categorization model didn’t categorize.

Automatic Filtering and Boosting

Automatic Filtering and Boosting is a search experience that applies filters for user queries based on Query Categorization predictions.

  • Automatic filtering applies a search query filter to remove items not matching the predicted category.
  • Automatic boosting applies an optional filter to the query to boost items matching the predicted category to the top.

The Query Categorization model decides if these predictions should be used as filters, boosts, or not used at all, depending on their confidence scores.

Implement Automatic Filtering and Boosting

To use Automatic Filtering and Boosting, you must enable Query Categorization in the Algolia dashboard. You can preview the impact of Automatic Filtering and Boosting from the Query Categorization section.

You can turn on Automatic Filtering and Boosting for an index in the tab with the same name (as the index). Once activated, filters and boosts are automatically injected into your search parameters at query time without requiring any frontend changes. In the Query Categorization section of Algolia’s dashboard (Search > Configure > Query Categorization), you can exclude (ban) queries and categories that should never be automatically filtered or boosted. Anything specified here overrides your index’s configuration.

Override Automatic Filtering and Boosting at query time

You can override the default configuration for automatic filtering and boosting with query parameters:

1
2
3
4
5
6
7
8
{
  /* Other standard query parameters... */
  "extensions": {
    "queryCategorization": {
      "enableAutoFiltering": true|false
    }
  }
}

To let users remove filters applied by Automatic Filtering and Boosting, you must explicitly turn off automatic filters and boosts on the search query targeting your index (when the user clears the automatic filter). Create an InstantSearch widget to implement a UI for this behavior.

Detect impact of Automatic Filtering and Boosting at query time

When Automatic Filtering and Boosting is active for a query, the extensions.queryCategorization.autofiltering section has the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
  /* Regular search answer (like hits...) */
  "extensions": {
    "queryCategorization": {
      "normalizedQuery": "banana",
      "count": 14870,
      "type": "narrow",
      "categories": [
        {
          "probability": 0.97,
          "hierarchyPath": [
            {
              "facetName": "categories.lvl0",
              "facetValue": "Food",
              "depth": 0
            },
            {
              "facetName": "categories.lvl1",
              "facetValue": "Food > Fruits",
              "depth": 1
            }
          ]
        }
      ],
      "autofiltering": {
        "enabled": true,
        "maxDepth": 5,
        "boostThreshold": 0.5,
        "filterThreshold": 0.9,
        "facetFilters": [
          [
            "categories.lvl0:Food"
          ],
          [
            "categories.lvl1:Food > Fruits"
          ],
        ],
        "optionalFilters": []
      }
    }
  }
}

You can activate Automatic Filtering and Boosting without it having an impact. In this case, you won’t see the additional fields in your search response.

Analytics grouped by categories

Once the Query Categorization model is set up, all search queries are grouped under their predicted categories in the Grouped Searches tab of Algolia’s dashboard (under Observe > Analytics). This view doesn’t include browsing queries (the empty query filtered on the category).

You can compare categories or click them to inspect their queries. Inside a category, the queries with a significantly lower click-through or conversion rate are automatically flagged as “underperforming”.

For instance, the two queries blue jeans and denim are flagged as belonging to the same category (pants). Grouped analytics displays the performance of the category pants (aggregating data for both blue jeans, denim, and other queries belonging to the pants category). You can then compare the performance of the two. For example, the pants category’s click-through rate is 10%, but the click-through rate for blue jeans is only 4% (and is identified as underperforming). You can improve the performance of the query by, for example, adding a synonym or a Rule.

With grouped analytics, you can aggregate your search analytics to gain new insights and optimize your Search and Discovery experience. It simplifies search analysis and helps manage the long tail of search queries.

Further reading

Did you find this page helpful?