🎉 Try the public beta of the new docs site at algolia.com/doc-beta! 🎉
Guides / Algolia AI

Query Categorization lets you predict the categories to which a search query belongs. To do this, it uses an AI model to create categories from the records in your index. Every index can use its own category hierarchy, depending on your needs, such as product categories for an ecommerce website or genres for a movie app. For example, in an online grocery shop, the query banana can be part of the category Food > Vegetables and fruits.

With Algolia’s Query Categorization feature, you have:

  • A dedicated section in the Algolia dashboard to set up the AI model and explore its predictions
  • Automatic filtering and boosting on predicted categories without writing extra code to help increase the relevance of your user’s results
  • Analytics grouped by predicted category to learn how the categories perform and to detect underperforming queries
  • Access to category predictions at query time (with the Search API) so that you can provide a Search and Discovery experience customized for your users.

How to set up Query Categorization

To set up Query Categorization, you must send click and conversion events and then configure the AI model. After setup, check the model output and the generated category tree.

You can also use the Query Categorization predictions in your frontend at query time.

Send click and conversion events

To use Query Categorization, you must send click or conversion events. Algolia AI uses this data to train its model to predict categories.

The Query Categorization model is retrained automatically every 24 hours. It always uses events from the last 30 days, meaning it’s based on a sliding window of the most recent analytics data.

Configure the model

To set up Query Categorization, you need to provide the facets on which the model makes predictions. The facets must represent the hierarchy of your categories (up to five levels deep).

Once you’ve entered your facets, click Save to start the model-building process. Depending on the number of categories and traffic, this can take a few minutes to half an hour.

Supported hierarchical facets formats

  • Assuming your records are structured like this:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    {
      "name": "banana",
      "description": "...",
      "price": 3.45,
      "hierarchicalCategories":
      {
        "lvl0": "Food",
        "lvl1": "Fruits"
      }
    }
    

    Set hierarchicalCategories.lvl0 as the first level used by the model and hierarchicalCategories.lvl1 as the second level.

  • If your records are structured like this:

    1
    2
    3
    4
    5
    6
    7
    
    {
      "name": "banana",
      "description": "...",
      "price": 3.45,
      "group": "Food",
      "section": "Fruits"
    }
    

    Set group as the first level used by the model and section as the second level.

Suppose your records belong to several categories simultaneously, and you use arrays to represent each level of depth. In that case, the model expects shared prefixes (for example, use Food as the first level facet value and Food > Fruits as the second level).

The model doesn’t support records structured with only one attribute for all depth levels. For example:

1
2
3
4
5
6
{
  "name": "banana",
  "description": "...",
  "price": 3.45,
  "categories": ["Food", "Food > Fruits"]
}

Model output

After configuration, the AI model will:

  1. Use the provided categories to build a “categories tree” (a hierarchical representation of your categories) based on the different facet values of items in your index.
  2. Extract the most likely categories for the most popular queries (by using the click and conversion events you sent to Algolia)
  3. Train an AI model to predict the categories associated with a query. Each prediction includes a confidence level from very low to certain and a type.

The confidence level can be:

  • very low
  • low
  • high
  • very high
  • certain

The type can be:

  • narrow for queries tied to only one item in your categories tree: a category that’s a “leaf” in the tree (that is, without further sub-categories).
  • broad for queries tied to a category that’s at a higher level in the categories tree (a category with sub-categories)
  • ambiguous for queries tied to several unrelated categories
  • none for queries for which the model couldn’t predict any category

Managing events source

Using a different source index for events lets you use alternative events to predict categories. For instance, you can use events from a production index on a test index (which won’t have had any user interactions). The current index must be a replica of the targeted source index. To use a different source index, go to the Categories Setting tab and find the Events source index field. Select the source index in the app from the drop-down menu. Once the configuration is saved, it will trigger a training, regenerating the category tree and the predictions using the source index events.

Managing the categories tree

After training the model, you can look at the generated categories tree by selecting the Categories Tree View tab in Algolia’s dashboard (Search > Configure > Query Categorization). Here, you can browse the hierarchy and check that it was correctly generated.

You can choose to ban (exclude) some categories from the predictions. For example, you should exclude values that aren’t actual categories, like “Black Friday” or “On sale”. Removing these values from the categories tree increases the model’s performance.

Retrieve the Query Categorization predictions at query time

You can use the predictions directly in your frontend at query time to implement, for example:

  • Query expansion. When you have limited results for a query, you can expand the results set with more items from the same category
  • Disambiguation. When the query is broad, you can suggest different categories to help narrow down the search
  • A tailored experience. Provide a custom experience based on the user intent by having a specific layout for some categories

Query Categorization populates your search results with the predicted categories for the search query. The query used for prediction is normalized by the engine, not the raw query.

Enabling Query Categorization at query time

To retrieve Query Categorization results at query time, you need to activate this option from the dashboard or in query parameters.

  • In the Query Categorization section of Algolia’s dashboard (Search > Configure > Query Categorization): you can find the Categories with Search API toggle in the Categories Settings tab.
  • In query parameters as a JSON object or a URL encoded string, for example: extensions%3D%7B%22queryCategorization%22%3A%7B%22enableCategoriesRetrieval%22%3Atrue%7D%7D

    Query Categorization parameters are grouped under the extensions field:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    {
      /* Other standard query parameters... */
      "extensions": {
        "queryCategorization": {
          "enableCategoriesRetrieval": true
          /* Other options to control Automatic Filtering and Boosting are available */
        }
      }
    }
    

Search response format

The search response has the usual format, with predictions in the attribute extensions.queryCategorization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  /* Regular search answer (like hits) */
  "extensions": {
    "queryCategorization": {
        "normalizedQuery": "banana",
        "count": 2,
        "type": "narrow",
        "categories": [
            {
                "bin": "very high",
                "hierarchyPath": [
                    {
                        "facetName": "category.lvl0",
                        "facetValue": "Food",
                        "depth": 0
                    },
                    {
                        "facetName": "category.lvl1",
                        "facetValue": "Fruits",
                        "depth": 1
                    }
                ]
            }
        ]
        }
    }
}

On rare occasions, extensions.queryCategorization can be an empty object for queries that the Query Categorization model didn’t categorize.

How to override predictions

To override the Query Categorization predictions, head to the Predictions Explorer tab of the Query Categorization section.

  • To modify the override or replace the predicted categories generated by the model, click the edit (pencil) icon
  • To revert an override to the predicted categories generated by the model, click the zap (lightning) icon.
  • To remove a model prediction, click the trash icon.

Query card override options

Changes are displayed in the predictions list. To confirm the changes, click Save changes at the bottom of the page.

Modifying the index classification in the Categories Settings tab will delete any override affected by this change. For example, if you remove the second facet level from the index classification, overrides with two levels of depth like Food > Fruits will be deleted, and the query will be reset to automatic predictions.

Automatic Filtering and Boosting

Automatic Filtering and Boosting is a search experience that applies filters for user queries based on Query Categorization predictions.

  • Automatic filtering applies a search query filter to remove items not matching the predicted category.
  • Automatic boosting applies an optional filter to the query to boost items matching the predicted category to the top.

The Query Categorization model decides if these predictions should be used as filters, boosts, or not used at all, depending on their confidence levels.

By default, only automatic boosting is activated. See configure Auto Filtering and Boosting to find how to enable automatic filtering as well.

Implement Automatic Filtering and Boosting

To use Automatic Filtering and Boosting, you must enable Query Categorization in the Algolia dashboard. You can preview the impact of Automatic Filtering and Boosting from the Query Categorization section.

You can turn on Automatic Filtering and Boosting for an index in the Automatic filtering & boosting Settings tab. Once activated, filters and boosts are automatically injected into your search parameters at query time without requiring any frontend changes. In the Query Categorization section of Algolia’s dashboard (Search > Configure > Query Categorization), you can exclude (ban) queries and categories that should never be automatically filtered or boosted. Anything specified here overrides your index’s configuration.

Override Automatic Filtering and Boosting at query time

You can override the default configuration for automatic filtering and boosting with query parameters:

1
2
3
4
5
6
7
8
{
  /* Other standard query parameters... */
  "extensions": {
    "queryCategorization": {
      "enableAutoFiltering": true|false
    }
  }
}

To let users remove filters applied by Automatic Filtering and Boosting, you must explicitly turn off automatic filters and boosts on the search query targeting your index (when the user clears the automatic filter). Create an InstantSearch widget to implement a UI for this behavior.

Detect impact of Automatic Filtering and Boosting at query time

When Automatic Filtering and Boosting is active for a query, the extensions.queryCategorization.autofiltering section has the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
  /* Regular search answer (like hits...) */
  "extensions": {
    "queryCategorization": {
      "normalizedQuery": "banana",
      "count": 14870,
      "type": "narrow",
      "categories": [
        {
          "bin": "certain",
          "hierarchyPath": [
            {
              "facetName": "categories.lvl0",
              "facetValue": "Food",
              "depth": 0
            },
            {
              "facetName": "categories.lvl1",
              "facetValue": "Food > Fruits",
              "depth": 1
            }
          ]
        }
      ],
      "autofiltering": {
        "enabled": true,
        "maxDepth": 5,
        "facetFilters": [
          [
            "categories.lvl0:Food"
          ],
          [
            "categories.lvl1:Food > Fruits"
          ],
        ],
        "optionalFilters": []
      }
    }
  }
}

You can activate Automatic Filtering and Boosting without it having an impact. In this case, you won’t see the additional fields in your search response.

Configure Automatic Filtering and Boosting impact

You can fine-tune the impact of Auto-Filtering and Boosting by changing two values in the Automatic filtering & boosting Settings tab:

  • The minimum expected confidence level for filtering
  • The minimum expected confidence level for boosting.

These values allow you to configure when filters or boosts are applied based on the predictions’ confidence levels.

The feature boosts the predictions with a confidence level equal or above the confidence level for boosting, but below the confidence level for filtering. The feature filters on the predictions with a confidence level equal or above the confidence level for filtering.

For instance, setting the confidence level for boosting to high and the confidence level for filtering to certain boosts high and very high predictions and filters on certain predictions.

The confidence level for boosting must be less than the confidence level for filtering.

You can turn off filtering or boosting using their respective disable option.

Preview Automatic Filtering and Boosting

You can preview Automatic Filtering and Boosting for any index from the Automatic filtering & boosting Preview tab of the Query categorization section in the dashboard. As long as you have category predictions for the selected index, this screen allows you to preview results for any query with predicted categories and show how Automatic Filtering and Boosting affects the results (without activating Automatic Filtering and Boosting on your production traffic).

Previewing Automatic Filtering and Boosting on your indices with the Automatic Filtering and Boosting Preview

The Automatic Filtering and Boosting Preview also shows how Promotion Rules and Dynamic Re-Ranking impact the results. You can turn off the Rules and Dynamic Re-Ranking in the preview using the Rules and Dynamic Re-Ranking toggles.

A/B test Automatic Filtering and Boosting

You can use A/B testing to evaluate Automatic Filtering and Boosting on an index and accurately measure its impact on your search. To do this, click the Launch an A/B test button from the Automatic filtering & boosting Settings tab of the Query categorization section in the dashboard.

Analytics grouped by categories

Once the Query Categorization model is set up, all search queries are grouped under their predicted categories in the Grouped Searches tab of Algolia’s dashboard (under Observe > Analytics). This view doesn’t include browsing queries (the empty query filtered on the category).

You can compare categories or click them to inspect their queries. Inside a category, the queries with a significantly lower click-through or conversion rate are automatically flagged as “underperforming”.

For instance, the two queries blue jeans and denim are flagged as belonging to the same category (pants). Grouped analytics displays the performance of the category pants (aggregating data for both blue jeans, denim, and other queries belonging to the pants category). You can then compare the performance of the two. For example, the pants category’s click-through rate is 10%, but the click-through rate for blue jeans is only 4% (and is identified as underperforming). You can improve the performance of the query by, for example, adding a synonym or a Rule.

With grouped analytics, you can aggregate your search analytics to gain new insights and optimize your Search and Discovery experience. It simplifies search analysis and helps manage the long tail of search queries.

Further reading

Did you find this page helpful?