Query Categorization
On this page
Query Categorization lets you predict the categories to which a search query belongs. To do this, it uses an AI model to create categories from the records in your index. Every index can use its own category hierarchy, depending on your needs, such as product categories for an ecommerce website or genres for a movie app. For example, in an online grocery shop, the query banana
can be part of the category Food > Vegetables and fruits
.
With Algolia’s Query Categorization feature, you have:
- A dedicated section in the Algolia dashboard to set up the AI model and explore its predictions
- Automatic filtering and boosting on predicted categories without writing extra code to help increase the relevance of your user’s results
- Analytics grouped by predicted category to learn how the categories perform and to detect underperforming queries
- Access to category predictions at query time (with the Search API) so that you can provide a Search and Discovery experience customized for your users.
How to set up Query Categorization
To set up Query Categorization, you must send click and conversion events and then configure the AI model. After setup, check the model output and the generated category tree.
You can also use the Query Categorization predictions in your frontend at query time.
Send click and conversion events
To use Query Categorization, you must send click or conversion events. Algolia AI uses this data to train its model to predict categories.
The Query Categorization model is retrained automatically every 24 hours. It always uses events from the last 30 days, meaning it’s based on a sliding window of the most recent analytics data.
Configure the model
To set up Query Categorization, you need to provide the facets on which the model makes predictions. The facets must represent the hierarchy of your categories (up to five levels deep).
Once you’ve entered your facets, click Save to start the model-building process. Depending on the number of categories and traffic, this can take a few minutes to half an hour.
Supported hierarchical facets formats
-
Assuming your records are structured like this:
Copy1 2 3 4 5 6 7 8 9 10
{ "name": "banana", "description": "...", "price": 3.45, "hierarchicalCategories": { "lvl0": "Food", "lvl1": "Fruits" } }
Set
hierarchicalCategories.lvl0
as the first level used by the model andhierarchicalCategories.lvl1
as the second level. -
If your records are structured like this:
Copy1 2 3 4 5 6 7
{ "name": "banana", "description": "...", "price": 3.45, "group": "Food", "section": "Fruits" }
Set
group
as the first level used by the model andsection
as the second level.
Suppose your records belong to several categories simultaneously, and you use arrays to represent each level of depth. In that case, the model expects shared prefixes (for example, use Food
as the first level facet value and Food > Fruits
as the second level).
The model doesn’t support records structured with only one attribute for all depth levels. For example:
1
2
3
4
5
6
{
"name": "banana",
"description": "...",
"price": 3.45,
"categories": ["Food", "Food > Fruits"]
}
Model output
After configuration, the AI model will:
- Use the provided categories to build a “categories tree” (a hierarchical representation of your categories) based on the different facet values of items in your index.
- Extract the most likely categories for the most popular queries (by using the click and conversion events you sent to Algolia)
- Train an AI model to predict the categories associated with a query. Each prediction includes a confidence level from
very low
tocertain
and a type.
The confidence level can be:
very low
low
high
very high
certain
The type can be:
narrow
for queries tied to only one item in your categories tree: a category that’s a “leaf” in the tree (that is, without further sub-categories).broad
for queries tied to a category that’s at a higher level in the categories tree (a category with sub-categories)ambiguous
for queries tied to several unrelated categoriesnone
for queries for which the model couldn’t predict any category
Managing events source
Using a different source index for events lets you use alternative events to predict categories. For instance, you can use events from a production index on a test index (which won’t have had any user interactions). The current index must be a replica of the targeted source index. To use a different source index, go to the Categories Setting tab and find the Events source index field. Select the source index in the app from the drop-down menu. Once the configuration is saved, it will trigger a training, regenerating the category tree and the predictions using the source index events.
Managing the categories tree
After training the model, you can look at the generated categories tree by selecting the Categories Tree View tab in Algolia’s dashboard (Search > Configure > Query Categorization). Here, you can browse the hierarchy and check that it was correctly generated.
You can choose to ban (exclude) some categories from the predictions. For example, you should exclude values that aren’t actual categories, like “Black Friday” or “On sale”. Removing these values from the categories tree increases the model’s performance.
Retrieve the Query Categorization predictions at query time
You can use the predictions directly in your frontend at query time to implement, for example:
- Query expansion. When you have limited results for a query, you can expand the results set with more items from the same category
- Disambiguation. When the query is
broad
, you can suggest different categories to help narrow down the search - A tailored experience. Provide a custom experience based on the user intent by having a specific layout for some categories
Query Categorization populates your search results with the predicted categories for the search query. The query used for prediction is normalized by the engine, not the raw query.
Enabling Query Categorization at query time
To retrieve Query Categorization results at query time, you need to activate this option from the dashboard or in query parameters.
- In the Query Categorization section of Algolia’s dashboard (Search > Configure > Query Categorization): you can find the Categories with Search API toggle in the Categories Settings tab.
-
In query parameters as a JSON object or a URL encoded string, for example:
extensions%3D%7B%22queryCategorization%22%3A%7B%22enableCategoriesRetrieval%22%3Atrue%7D%7D
Query Categorization parameters are grouped under the
extensions
field:Copy1 2 3 4 5 6 7 8 9
{ /* Other standard query parameters... */ "extensions": { "queryCategorization": { "enableCategoriesRetrieval": true /* Other options to control Automatic Filtering and Boosting are available */ } } }
Search response format
The search response has the usual format, with predictions in the attribute extensions.queryCategorization
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
/* Regular search answer (like hits) */
"extensions": {
"queryCategorization": {
"normalizedQuery": "banana",
"count": 2,
"type": "narrow",
"categories": [
{
"bin": "very high",
"hierarchyPath": [
{
"facetName": "category.lvl0",
"facetValue": "Food",
"depth": 0
},
{
"facetName": "category.lvl1",
"facetValue": "Fruits",
"depth": 1
}
]
}
]
}
}
}
On rare occasions, extensions.queryCategorization
can be an empty object for queries that the Query Categorization model didn’t categorize.
How to override predictions
To override the Query Categorization predictions, head to the Predictions Explorer tab of the Query Categorization section.
- To modify the override or replace the predicted categories generated by the model, click the edit (pencil) icon
- To revert an override to the predicted categories generated by the model, click the zap (lightning) icon.
- To remove a model prediction, click the trash icon.
Changes are displayed in the predictions list. To confirm the changes, click Save changes at the bottom of the page.
Modifying the index classification in the Categories Settings tab will delete any override affected by this change. For example, if you remove the second facet level from the index classification, overrides with two levels of depth like Food
> Fruits
will be deleted, and the query will be reset to automatic predictions.
Automatic Filtering and Boosting
Automatic Filtering and Boosting is a search experience that applies filters for user queries based on Query Categorization predictions.
- Automatic filtering applies a search query filter to remove items not matching the predicted category.
- Automatic boosting applies an optional filter to the query to boost items matching the predicted category to the top.
The Query Categorization model decides if these predictions should be used as filters, boosts, or not used at all, depending on their confidence levels.
By default, only automatic boosting is activated. See configure Auto Filtering and Boosting to find how to enable automatic filtering as well.
Implement Automatic Filtering and Boosting
To use Automatic Filtering and Boosting, you must enable Query Categorization in the Algolia dashboard. You can preview the impact of Automatic Filtering and Boosting from the Query Categorization section.
You can turn on Automatic Filtering and Boosting for an index in the Automatic filtering & boosting Settings tab. Once activated, filters and boosts are automatically injected into your search parameters at query time without requiring any frontend changes. In the Query Categorization section of Algolia’s dashboard (Search > Configure > Query Categorization), you can exclude (ban) queries and categories that should never be automatically filtered or boosted. Anything specified here overrides your index’s configuration.
Override Automatic Filtering and Boosting at query time
You can override the default configuration for automatic filtering and boosting with query parameters:
1
2
3
4
5
6
7
8
{
/* Other standard query parameters... */
"extensions": {
"queryCategorization": {
"enableAutoFiltering": true|false
}
}
}
To let users remove filters applied by Automatic Filtering and Boosting, you must explicitly turn off automatic filters and boosts on the search query targeting your index (when the user clears the automatic filter). Create an InstantSearch widget to implement a UI for this behavior.
Detect impact of Automatic Filtering and Boosting at query time
When Automatic Filtering and Boosting is active for a query, the extensions.queryCategorization.autofiltering
section has the following content:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
/* Regular search answer (like hits...) */
"extensions": {
"queryCategorization": {
"normalizedQuery": "banana",
"count": 14870,
"type": "narrow",
"categories": [
{
"bin": "certain",
"hierarchyPath": [
{
"facetName": "categories.lvl0",
"facetValue": "Food",
"depth": 0
},
{
"facetName": "categories.lvl1",
"facetValue": "Food > Fruits",
"depth": 1
}
]
}
],
"autofiltering": {
"enabled": true,
"maxDepth": 5,
"facetFilters": [
[
"categories.lvl0:Food"
],
[
"categories.lvl1:Food > Fruits"
],
],
"optionalFilters": []
}
}
}
}
You can activate Automatic Filtering and Boosting without it having an impact. In this case, you won’t see the additional fields in your search response.
Configure Automatic Filtering and Boosting impact
You can fine-tune the impact of Auto-Filtering and Boosting by changing two values in the Automatic filtering & boosting Settings tab:
- The minimum expected confidence level for filtering
- The minimum expected confidence level for boosting.
These values allow you to configure when filters or boosts are applied based on the predictions’ confidence levels.
The feature boosts the predictions with a confidence level equal or above the confidence level for boosting, but below the confidence level for filtering. The feature filters on the predictions with a confidence level equal or above the confidence level for filtering.
For instance, setting the confidence level for boosting to high
and the confidence level for filtering to certain
boosts high
and very high
predictions and filters on certain
predictions.
The confidence level for boosting must be less than the confidence level for filtering.
You can turn off filtering or boosting using their respective disable option.
Preview Automatic Filtering and Boosting
You can preview Automatic Filtering and Boosting for any index from the Automatic filtering & boosting Preview tab of the Query categorization section in the dashboard. As long as you have category predictions for the selected index, this screen allows you to preview results for any query with predicted categories and show how Automatic Filtering and Boosting affects the results (without activating Automatic Filtering and Boosting on your production traffic).
The Automatic Filtering and Boosting Preview also shows how Promotion Rules and Dynamic Re-Ranking impact the results. You can turn off the Rules and Dynamic Re-Ranking in the preview using the Rules and Dynamic Re-Ranking toggles.
A/B test Automatic Filtering and Boosting
You can use A/B testing to evaluate Automatic Filtering and Boosting on an index and accurately measure its impact on your search. To do this, click the Launch an A/B test button from the Automatic filtering & boosting Settings tab of the Query categorization section in the dashboard.
Analytics grouped by categories
Once the Query Categorization model is set up, all search queries are grouped under their predicted categories in the Grouped Searches tab of Algolia’s dashboard (under Observe > Analytics). This view doesn’t include browsing queries (the empty query filtered on the category).
You can compare categories or click them to inspect their queries. Inside a category, the queries with a significantly lower click-through or conversion rate are automatically flagged as “underperforming”.
For instance, the two queries blue jeans
and denim
are flagged as belonging to the same category (pants
). Grouped analytics displays the performance of the category pants
(aggregating data for both blue jeans
, denim
, and other queries belonging to the pants
category). You can then compare the performance of the two. For example, the pants
category’s click-through rate is 10%, but the click-through rate for blue jeans
is only 4% (and is identified as underperforming). You can improve the performance of the query by, for example, adding a synonym or a Rule.
With grouped analytics, you can aggregate your search analytics to gain new insights and optimize your Search and Discovery experience. It simplifies search analysis and helps manage the long tail of search queries.