Guides / Going to production

Case study for an online clothing company

This case study helps you understand how to configure Algolia using the dashboard: it’s equally valuable for business users with no technical background and developers. The main goal is to help you follow the proper steps and get consistently great search results.

The real-world case study gives a better picture of the configuration process.

Selecting searchable attributes

Attributes are the key-value pairs that compose records, and the records compose an index.

You need to select the most relevant attributes a user can search for. For example, on an ecommerce search, searchable attributes could be name or description.

Initial list of searchable attributes

  • name
  • description
  • price
  • image_url

From the initial list, you wouldn’t select price or image_url. These attributes are helpful for sorting and display. A user wouldn’t search for those.

Selecting your searchable attributes

  • name
  • description

Add color and brand as searchable attributes. These attributes are often used for filtering and faceting, but people also use them in search queries.

Adding more attributes

  • description
  • name
  • color
  • brand

To further improve relevance, you need to move the most relevant attributes to the top of the list. In this case, users tend to search for a color or a brand as well as an item’s name or description. It then makes more sense to have these attributes higher in the list.

Users tend to search with descriptive terms instead of exact item names. To account for it, put description first on the list before name.

Listing the most relevant attributes first

  • description
  • name
  • brand, color

You could find that brand and color are just as relevant to your users. You would then put them both on the same line, separated by a comma.

Listing two equal ranking attributes

  • description
  • name
  • brand, color

Long attributes like description can be too “noisy” and generate false positives in relevance. In such cases, you can create a derived, shorter attribute and pick just the search-relevant terms. For example, create a short_description attribute by selecting the applicable search terms from description. To create this shorter attribute, your engineering team needs to configure your data so that it’s available in Algolia’s dashboard.

Replace a “noisy” attribute with a more efficient one

  • short_description
  • name
  • brand, ‘color`

You might want to ensure that all the words in short_description have the same importance. This means that words in the description’s beginning, middle, and end are uniformly relevant. You can do this with the unordered modifier. You can do the same for any multi-word attribute, like item_name.

Final list of searchable attributes

  • unordered(short_description)
  • unordered(name)
  • brand, color

Setting custom ranking and sorting

You start with Algolia’s out-of-the-box ranking formula.

Default ranking

  • Default ranking formula

You shouldn’t change or remove this ranking formula. It works out of the box for 99% of use cases.

Setting custom ranking

You can now customize your ranking by adding some business metrics. It’s typical to add popularity attributes, such as the number of likes or best sellers. It’s also worth using Click and Conversion Analytics to rank the products with the most successful conversion rate.

Custom ranking (main index)

  • Default ranking formula
  • number_of_sells
  • popularity
  • conversion_rate

Sorting by a specific attribute

Consider allowing your users to sort by a specific attribute. To do so, you can use Algolia’s sorting capability, which requires you to create a replica index for each sort.

For example, you can sort by price, from highest-priced items to lowest. To do this, you can create a new replica index called products_sorted_by_price_descending.

Sort by price, highest to lowest (replica one)

  • price (sort-by, descending)
  • Default ranking formula

To reverse the order and sort by ascending price, you need to add another replica index, products_sorted_by_price_ascending.

Sort by price, lowest to highest (replica two)

  • price (sort-by, ascending)
  • Default ranking formula

Do the same for the date with a third replica and sort from newest to oldest.

Sort by date, newest to oldest (replica three) –> ### Sort by date, newest to oldest (replica three)

  • date (sort-by, descending)
  • Default ranking formula

You now have four indices (a main index and three replicas):

  • Your main index with custom ranking (by best-sellers, popularity, and conversion rate)
  • Your three replicas are:
    • Sorted by price, descending
    • Sorted by price, ascending
    • Sorted by date, descending

If your plan includes it, Relevant sorting offers the best user experience for most ecommerce use cases. This sort provides the most relevant results instead of sorting on attributes like price and date. Relevant sorting also doesn’t require data duplication, keeping your app leaner.

For example, for an ascending price sort, the query “red skirt” wouldn’t return items containing “red shirt” because they’re less relevant. Relevant sorting removes noise for users.

Creating buckets to combine sorts with custom ranking

Consider adding a field like featured, a true or false value that forces all featured items to show up first.

  • featured (sort-by, descending)
  • Default ranking formula
  • number_of_sells
  • popularity
  • conversion_rate

By doing this, you’re creating two buckets of results, where each bucket is individually ranked. If you have 100 results, 50 of which are featured, the first bucket contains all featured items. The featured items rank by textual relevance and custom ranking. The second bucket—the 50 non-featured items—rank by textual relevance and custom ranking.

One consequence with buckets is that you may have the most textually relevant record appear in the 51st position because it’s not in the first bucket—that is, it’s not a featured item.

Promoting content with good user engagement

Consider prioritizing content that has more likes and comments. Depending on how much prioritization you want the content to have, there are two ways to approach this: Rules and custom ranking.

Sorting on likes or the number of comments isn’t recommended since all other relevancy criteria are lost.

Rules

Use Rules to move items with more likes or comments to the top of your results. You can enhance this further by using optional filters with filter scoring.

Custom ranking

Use custom ranking for likes and/or number of comments. For example, assuming you had these attributes in your records, use the likes/comments post date (postDate) and the number of comments or likes (engagement) as sortable attributes. To do this on the dashboard, go to Index > Ranking and Sorting and click the Add sort-by attribute button (set the attribute to Descending sort order).

It’s essential to consider postDate since, otherwise, all old comments have equal relevance to newer ones. For example, consider these four items:

  1. One from last year with enormous engagement (100K likes and comments),
  2. One from yesterday with great engagement (50K likes and comments)
  3. One from today with good engagement (10K likes and comments)
  4. One from today with decent engagement (5K likes and comments).

If postDate appears before engagement in your custom ranking order, post 3 and then post 4 will appear at the top of these results.

However, post 2 is relatively recent and has much higher engagement, so it should sit before 3 and 4. The problem is that a single day is too precise for this case.

Reducing precision

You can reduce precision by creating buckets. For instance, instead of ranking by postDate, you could assign a “recency score”:

  • 0 if it’s a week old or less
  • 1 if it’s between one and two weeks old
  • Up to a score where recency no longer matters (it may not matter if a post is over two years old).

You could also reduce precision for engagement: instead of ranking on a specific number of likes or comments, you could round to the nearest dozen, hundred, or thousand.

In such a case, your record would look something like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[
  {
    "title": "Post #1",
    "postDate": 1569226869, // last year
    "likes_and_comments": 109023, // exact computation
    "recency": 5, // a recency score based on an arbitrary scale of your choosing
    "engagement": 109000 // rounded engagement
  },
  {
    "title": "Post #2",
    "postDate": 1601281269, // yesterday
    "likes_and_comments": 57986,
    "recency": 0, // a week old or less
    "engagement": 58000
  },
  {
    "title": "Post #3",
    "postDate"": 1601367669, // today
    "likes_and_comments": 11356,
    "recency": 0,
    "engagement": 11000
  },
  {
    "title": "Post #4",
    "postDate": 1601367669, // today
    "likes_and_comments": 5439,
    "recency": 0,
    "engagement": 5000
  }
]

Your custom ranking would be:

  1. recency
  2. engagement
  3. postDate
  4. likes_and_comments

Self-optimizing content

You might want to ensure that items with the most clicks get boosted toward the top of search results.

To do this:

  1. Create a numberOfClicks attribute in your data
  2. Collect click events from your users
  3. Update the numberOfClicks values in your records with the collected data
  4. Use custom ranking on numberOfClicks to boost items with a higher number of clicks (after relevance has been taken into account).

Something to watch out for, though, is that this is self-reinforcing change. The higher these items are in the results, the more likely they are to be clicked, meaning that they will gradually rank higher and higher. You may want to use Algolia’s A/B testing feature to check the impact of a change like this.

Using unique objectIDs to preserve your data

Some of your records may have duplicate objectIDs. This causes problems when updating your data.

Duplicate objectIDs

  • objectID=12345 (red t-shirt)
  • objectID=12345 (Nike shoes)
  • objectID=67890 (Levi jeans, slim)
  • objectID=67890 (Levi jeans, slimmed)

Here, it looks like two objects share the same objectID of 12345. If you try to index them both, the index retains just one. Make sure that each item has a unique objectID.

The items with objectID:67890 look like duplicate records. You should remove one of them.

Fixed, no duplicate objectIDs

  • objectID=12345 (red t-shirt)
  • objectID=23456 (Nike shoes)
  • objectID=67890 (Levi jeans, slim)

Language settings

If you haven’t set up your language to that of your users, you should do so. Also, make sure that you’ve set ignorePlurals and removeStopWords to true.

Defining synonyms

If you’re selling coats, you may notice that some people search for “coats” and others for “jackets”. In your store, these are the same.

Your synonyms

  • coat=jacket

You may also want to add a synonym for shoes and boots so that users looking for shoes can find boots as well. To do this, create a synonym for “shoes” and “boots.”

Your synonyms

  • coat=jacket
  • shoes=boots

Keep this list up to date with as many synonyms as necessary, but not too many. A long list of synonyms can become unmanageable and create false positives.

Frontend UI concerns

Highlighting

Using highlighting lets your users instantly see why a record is present in the results.

Without highlighting

Query: nike

Results

  • Nike Air is the best
  • Magic Nike is built to last

With highlighting

Query: nike

Results

  • Nike Air is the best
  • Magic Nike is built to last

Instant search results and as-you-type search experience

Implement instant search results with the InstantSearch libraries.

Setting up facets

Do you have helpful categories in your index, like colors or brands? Add them as facets and display them on your UI to let your users filter their results.

Staying up to date

Ensure that you’re using the latest version of Algolia’s libraries:

  • Frontend library
  • Search client
  • Indexing client
Did you find this page helpful?