Query Categorization
On this page
Query Categorization is a beta feature according to the Algolia Terms of Service (“Beta Services”).
Query Categorization lets you predict the categories to which a search query belongs. To do this, it uses an AI model to create categories from the records in your index. Every index can use its own category hierarchy, depending on your needs, such as product categories for an ecommerce website or genres for a movie app. For example, in an online grocery shop, the query banana
can be part of the category Food > Vegetables and fruits
.
With Algolia’s Query Categorization feature, you have:
- A dedicated section in the Algolia dashboard to set up the AI model and explore its predictions
- Automatic filtering and boosting on predicted categories without writing extra code to help increase the relevance of your user’s results
- Analytics grouped by predicted category to learn how the categories perform and to detect underperforming queries
- Access to category predictions at query time (with the Search API) so that you can provide a Search and Discovery experience customized for your users.
How to set up Query Categorization
To set up Query Categorization, you must send click and conversion events and then configure the AI model. After setup, check the model output and the generated category tree.
You can also use the Query Categorization predictions in your frontend at query time.
Send click and conversion events
To use Query Categorization, you must send click or conversion events. Algolia AI uses this data to train its model to predict categories.
The Query Categorization model is retrained automatically every 24 hours. It always uses events from the last 30 days, meaning it’s based on a sliding window of the most recent analytics data.
Configure the model
To set up Query Categorization, you need to provide the facets on which the model makes predictions. The facets must represent the hierarchy of your categories (up to five levels deep).
Once you’ve entered your facets, click Save to start the model-building process. Depending on the number of categories and traffic, this can take a few minutes to half an hour.
Supported hierarchical facets formats
-
Assuming your records are structured like this:
Copy1 2 3 4 5 6 7 8 9 10
{ "name": "banana", "description": "...", "price": 3.45, "hierarchicalCategories": { "lvl0": "Food", "lvl1": "Fruits" } }
Set
hierarchicalCategories.lvl0
as the first level used by the model andhierarchicalCategories.lvl1
as the second level. -
If your records are structured like this:
Copy1 2 3 4 5 6 7
{ "name": "banana", "description": "...", "price": 3.45, "group": "Food", "section": "Fruits" }
Set
group
as the first level used by the model andsection
as the second level.
Suppose your records belong to several categories simultaneously, and you use arrays to represent each level of depth. In that case, the model expects shared prefixes (for example, use Food
as the first level facet value and Food > Fruits
as the second level).
The model doesn’t support records structured with only one attribute for all depth levels. For example:
1
2
3
4
5
6
{
"name": "banana",
"description": "...",
"price": 3.45,
"categories": ["Food", "Food > Fruits"]
}
Model output
After configuration, the AI model will:
- Use the provided categories to build a “categories tree” (a hierarchical representation of your categories) based on the different facet values of items in your index.
- Extract the most likely categories for the most popular queries (by using the click and conversion events you sent to Algolia)
- Train an AI model to predict the categories associated with a query. Each prediction includes a confidence score from 0 to 100 and a type.
The type can be:
narrow
for queries tied to only one item in your categories tree: a category that’s a “leaf” in the tree (that is, without further sub-categories).broad
for queries tied to a category that’s at a higher level in the categories tree (a category with sub-categories)ambiguous
for queries tied to several unrelated categoriesnone
for queries for which the model couldn’t predict any category
Managing the categories tree
After training the model, you can look at the generated categories tree by selecting the Categories Tree View tab in Algolia’s dashboard (Search > Configure > Query Categorization). Here, you can browse the hierarchy and check that it was correctly generated.
You can choose to ban (exclude) some categories from the predictions. For example, you should exclude values that aren’t actual categories, like “Black Friday” or “On sale”. Removing these values from the categories tree increases the model’s performance.
Retrieve the Query Categorization predictions at query time
You can use the predictions directly in your frontend at query time to implement, for example:
- Query expansion. When you have limited results for a query, you can expand the results set with more items from the same category
- Disambiguation. When the query is
broad
, you can suggest different categories to help narrow down the search - A tailored experience. Provide a custom experience based on the user intent by having a specific layout for some categories
Query Categorization populates your search results with the predicted categories for the search query. The query used for prediction is normalized by the engine, not the raw query.
Enabling Query Categorization at query time
To retrieve Query Categorization results at query time, you need to activate this option from the dashboard or in query parameters.
- The Query Categorization section of Algolia’s dashboard (Search > Configure > Query Categorization).
-
In query parameters as a JSON object or a URL encoded string, for example:
extensions%3D%7B%22queryCategorization%22%3A%7B%22enableCategoriesRetrieval%22%3Atrue%7D%7D
Query Categorization parameters are grouped under the
extensions
field:Copy1 2 3 4 5 6 7 8 9
{ /* Other standard query parameters... */ "extensions": { "queryCategorization": { "enableCategoriesRetrieval": true /* Other options to control Automatic Filtering and Boosting are available */ } } }
Search response format
The search response has the usual format, with predictions in the attribute extensions.queryCategorization
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
/* Regular search answer (like hits) */
"extensions": {
"queryCategorization": {
"normalizedQuery": "banana",
"count": 2,
"type": "narrow",
"categories": [
{
"probability": 0.87,
"hierarchyPath": [
{
"facetName": "category.lvl0",
"facetValue": "Food",
"depth": 0
},
{
"facetName": "category.lvl1",
"facetValue": "Fruits",
"depth": 1
}
]
}
]
}
}
}
On rare occasions, extensions.queryCategorization
can be an empty object for queries that the Query Categorization model didn’t categorize.
Automatic Filtering and Boosting
Automatic Filtering and Boosting is a search experience that applies filters for user queries based on Query Categorization predictions.
- Automatic filtering applies a search query filter to remove items not matching the predicted category.
- Automatic boosting applies an optional filter to the query to boost items matching the predicted category to the top.
The Query Categorization model decides if these predictions should be used as filters, boosts, or not used at all, depending on their confidence scores.
Implement Automatic Filtering and Boosting
To use Automatic Filtering and Boosting, you must enable Query Categorization in the Algolia dashboard. You can preview the impact of Automatic Filtering and Boosting from the Query Categorization section.
You can turn on Automatic Filtering and Boosting for an index in the tab with the same name (as the index). Once activated, filters and boosts are automatically injected into your search parameters at query time without requiring any frontend changes. In the Query Categorization section of Algolia’s dashboard (Search > Configure > Query Categorization), you can exclude (ban) queries and categories that should never be automatically filtered or boosted. Anything specified here overrides your index’s configuration.
Override Automatic Filtering and Boosting at query time
You can override the default configuration for automatic filtering and boosting with query parameters:
1
2
3
4
5
6
7
8
{
/* Other standard query parameters... */
"extensions": {
"queryCategorization": {
"enableAutoFiltering": true|false
}
}
}
To let users remove filters applied by Automatic Filtering and Boosting, you must explicitly turn off automatic filters and boosts on the search query targeting your index (when the user clears the automatic filter). Create an InstantSearch widget to implement a UI for this behavior.
Detect impact of Automatic Filtering and Boosting at query time
When Automatic Filtering and Boosting is active for a query, the extensions.queryCategorization.autofiltering
section has the following content:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
/* Regular search answer (like hits...) */
"extensions": {
"queryCategorization": {
"normalizedQuery": "banana",
"count": 14870,
"type": "narrow",
"categories": [
{
"probability": 0.97,
"hierarchyPath": [
{
"facetName": "categories.lvl0",
"facetValue": "Food",
"depth": 0
},
{
"facetName": "categories.lvl1",
"facetValue": "Food > Fruits",
"depth": 1
}
]
}
],
"autofiltering": {
"enabled": true,
"maxDepth": 5,
"boostThreshold": 0.5,
"filterThreshold": 0.9,
"facetFilters": [
[
"categories.lvl0:Food"
],
[
"categories.lvl1:Food > Fruits"
],
],
"optionalFilters": []
}
}
}
}
You can activate Automatic Filtering and Boosting without it having an impact. In this case, you won’t see the additional fields in your search response.
Analytics grouped by categories
Once the Query Categorization model is set up, all search queries are grouped under their predicted categories in the Grouped Searches tab of Algolia’s dashboard (under Observe > Analytics). This view doesn’t include browsing queries (the empty query filtered on the category).
You can compare categories or click them to inspect their queries. Inside a category, the queries with a significantly lower click-through or conversion rate are automatically flagged as “underperforming”.
For instance, the two queries blue jeans
and denim
are flagged as belonging to the same category (pants
). Grouped analytics displays the performance of the category pants
(aggregating data for both blue jeans
, denim
, and other queries belonging to the pants
category). You can then compare the performance of the two. For example, the pants
category’s click-through rate is 10%, but the click-through rate for blue jeans
is only 4% (and is identified as underperforming). You can improve the performance of the query by, for example, adding a synonym or a Rule.
With grouped analytics, you can aggregate your search analytics to gain new insights and optimize your Search and Discovery experience. It simplifies search analysis and helps manage the long tail of search queries.