Guides / Getting insights and analytics / Leveraging analytics data / Query Suggestions

Configuring Query Suggestions

To ensure the Query Suggestions feature best aligns with your use case and needs, we made it very configurable. You can decide: your data sources, your suggestions settings, and generate initial suggestions from facets.

Linking data sources

Although Algolia Analytics is the default data source for your Query Suggestions index, it is not the only possible source.

If you haven’t collected Algolia Analytics data, you have two options: you can upload your own suggestions or generate suggestions from your main index’s filters and facets.

Uploading your own suggestions

To make your own suggestions, create a JSON file and upload the content to an Algolia index. In selecting which queries to suggest, you can either add what you consider “ideal” queries, or use analytics data from other sources (for example Google Analytics or Segment).

The only requirement is for the data to follow the same format as Algolia’s own analytics data.

Here is an example of the required format for your JSON date:

    "query": "iphone",
    "count": 10031
    "query": "samsung",
    "count": 731

External data should complement or initialize Query Suggestions. Using Algolia Analytics as the primary data source for your Query Suggestions will provide the best results.

If a suggestion appears in both your Analytics and external data, then the record in your external data index is ignored.

Using your searchable facets

You can use your main index to generate query suggestions by using facets. You send the Query Suggestions engine attributes that are set up as searchable facets. These will be resolved into very precise suggestions. For example, if you send the engine “brand + color”, you can generate a query for every brand + color combination:

  • “nike red”
  • “nike blue”
  • “adidas red”
  • “adidas blue”

These queries are to send along with their respective scores, which is based on the amount of items that match the query.

This method should be primarily used when you have no initial data - neither analytics nor your own suggestions. In the long run, these facet-based suggestions become less relevant (that is, less popular) than your analytics-based ones.

You’ll need to have large activity to justify using Query Suggestions. Algolia analytics generates up to 30,000 suggestions, larger customers might need to supplement the data with another analytics provider.

Note: the analytics data of your replicas will be taken into account when generating suggestions.

Configuring the format and settings of your suggestions

You need to be careful: by default, every search is captured and uploaded. Therefore, Query Suggestions can be misled - the data can be mistaken, irrelevant, or even completely embarrassing! One non-Algolia powered company, for example, once received a flood of phony, inappropriate searches, which, due to their apparent popularity, scored higher than other searches. This meant that other users began to see these phony searches. It’s a real concern that you want to address.

We have 4 systems in place to limit inappropriate suggestions and to prevent fraud.

Minimum number of results required

By default, the suggestions index only includes queries that return at least five results in the source index. This limit is configurable through the dashboard

Minimum number of letters required

We do not suggest queries that contain less than the minimum amount of letters configured. The default is 4. This limit is configurable through the dashboard.

IP restrictions

We apply a distinct IP per query policy. When an IP sends numerous similar queries, only one of those queries counts towards the popularity of its related suggestion. This creates a barrier to an accidental or abusive flood of requests from the same IP. This restriction can not be removed or configured.

Blacklisted words

Perhaps the most powerful way to tell the engine what not to do, is to use our banned list functionality. With blacklisting, you can remove any unwanted queries. There are two ways to do this:

  • A list of exact matches: you can eliminate full queries by creating a list of partial words, full words, and phrases that the engine should ignore. The query must find an exact match in the list to be ignored.
  • You can also create one or more expressions that look for certain inappropriate or unwanted patterns in queries. Here, you can implement partial matching: any query that contains a blacklisted expression is ignored.

This limit is configurable through the dashboard.

Ranking suggestions

Query Suggestions is entirely data-driven, suggesting the best-matched searches based on a comparison between the letters and words being typed and a collection of stored queries.

The best-matching algorithm is pretty straight-forward: best suggestions are based on popularity, which is the number of times the query has been previously typed in. Essentially, queries that have the highest counts - that is, which have been made most often by users in the past - are the ones that appear at the top of the list of query suggestions.. Here’s an example:

  "query": "iphone X",
  "popularity": 255

Whatever the source of data, the suggestion engine uses a popularity score to determine which queries to suggest. Every input uses the same format for its suggestions: the query + the number of times it has been used. Note that if the same query comes in from more than one source, we add the values together and use the total as the score for that query.

Regenerating your Suggestions Index

There are 2 important aspects to keep in mind with query suggestions:

  • The content of suggestions - which suggestions appear
  • The order of the suggestions - which appear at the top and which appear below

Take a look at these suggestions for the query “iph”:

  • iphone
  • iphone 6
  • iphone 8
  • iphone charger

These suggestions seem appropriate and relevant: “iph” suggests “iphone”, and the most popular queries for iphone are either models or accessories. Good so far. However, iphone X is missing and 6 is higher than 8. This is most likely due to the current analytics data, which relies on popularity.

  • From a content point of view, X is a more recent release, so it is not yet popular enough to be included as a suggestion.
  • From an order point of view, 6 has been on the market longer than 8, so it appears above 8, even though 8 might have been the better, more up-to-date / relevant suggestion.

To solve this, we regenerate your Query Suggestions index every 24 hours, and take into account only the last 30 days of your analytics data. So if during the last month more people search for “iphone 8” than “iphone 6”, the former ranks higher even if “iphone 6” has more all-time searches. This is a good solution, but in no way perfect. That’s why we let our customers upload external analytics data to improve and fine-tune the popularity scores of their suggestions.

To maximize the relevance of your suggestions, you can also add Rules to your suggestions index

Did you find this page helpful?