Relevance – it’s what we’re all going for with our search implementations, but it’s so subjective that it’s nearly impossible to nail down. How in the world are we supposed to optimize our search index to get the most relevant (read: converting and revenue-creating) results to our users?
Way back in 2016, we wrote an article with 10 tips to achieve highly relevant search results. We already had a lot of experience and 1500 customers loving our search toolset, but we’ve grown a lot since then: our satisfied customer base has multiplied by 11 and we’ve become industry leaders. All that experience comes with new lessons, so we wanted to rewrite this guide to bring you a simpler, more straightforward four questions to ask yourself to improve the relevance of your search results and rake in all the benefits that come with it.
The biggest key to relevance in search is to remove the subjectivity. It’s impossible for us to optimize the algorithm for every single usecase, but we give you the tools to optimize it for your usecase. But to make the best use of those tools, you have to figure out what exactly makes a result relevant in your application.
For example, if you’re working with a database of movies, it’d definitely make sense to have simple searchable terms split out as their own attributes in the dataset (like the movie’s title, director, year, an array of lead actors, genre, and other stuff people might search for). But you could get away with a lot of the data unformatted in a “description” text field:
{
"title": "Star Wars: Episode IV - A New Hope",
"alternative_titles": [
"Star Wars",
"Star Wars: Episódio IV - Uma Nova Esperança",
"Star Wars: Épisode IV - Un nouvel espoir",
…
],
"genre": [
"Adventure",
"Fantasy",
"Science Fiction"
],
"objectID": "440309800",
"actors": [
"Mark Hamill",
"Carrie Fisher",
"Harrison Ford",
"Alec Guinness",
…
],
"director": "George Lucas",
"year": 1979,
"description": "Luke Skywalker joins forces with a Jedi Knight, a cocky pilot, a Wookiee and two droids to save the galaxy from the Empire's world-destroying battle station, while also attempting to rescue Princess Leia from the mysterious Darth Vader."
}
It’s not totally necessary to break up your description into a list of characters or tags. You’re still going to need to have the description around somewhere to eventually display with the search result, so adding super fine-grained attributes is just going to duplicate that data, take up more storage space, and not really help you much. Plus, it could actually decrease relevance: Luke Skywalker would be a far more popular result than any actor named Luke, so if I’m actually searching for an actor, I’d get results from Star Wars films crowding out what I’m probably looking for.
That logic doesn’t hold up in e-commerce, though. If you’ve got a product database where users could search for nearly any product attribute, your search index has to be very fine-grained. Every piece of data that can be split out as a new searchable or facetable attribute gets its own spot, and then you can order them by importance to your application.
Takeaway: Figure out exactly what makes a result “relevant” in your application, and structure your search index data around that.
The answer is no, not unless you really know what you’re doing. Lots of research and development (and validation by 17,000 customers with production implementations) has gone into those defaults, so you’d really only need to mess with them in very specific usecases. In years past (before these values were optimized), it was more common to suggest in instructional articles and in general advice to change these defaults, and since much of that content still exists on the Internet today, let’s look at a few pieces of Algolia’s algorithm to set you off on the right foot if you do actually have to change some things.
If your record contains “iphone”, you should be able to find it via “ipjone” or “iphoen”. This will be handled automatically by the Algolia engine. If you add in more typo-checking, you’ll get less relevant results popping up, since a couple substitutions, additions, and deletions could get you all the way to a completely different relevant query (just think of how often your phone’s hyperactive spellcheck changes a correctly spelled word to something else entirely). On the other hand, turning this down could mean penalizing your users for not spelling their queries perfectly, and that’s no fun for anyone (especially if they’re searching for brand names or something other words we don’t need to spell often).
Stop words are the most commonly used words in a given language like “the”, “of”, “to”, “be”, “or”, etc. We’ve come across a lot of developers who think that they need to remove these words to leave more space for the more meaningful words in a query that’ll contribute to finding a more relevant result. But that’s just not true. If your search engine treats them right, these words can be very useful, and removing them could make finding some results almost impossible. For example, Google “To Beta or not to Beta”. It’ll pull up detailed scientific articles on software development, astronomy, ornithology, and the economy. Strip out the stop words and duplicate, and you’re just left with just “beta”. When I Google that, I get results on the Greek letter, the motorcycle company, and Apple’s beta-testing program. Those stop words have significant value! While this might be an exaggerated example specifically to highlight the phenomenon, it shows up in a smaller scale on much smaller datasets and in much less obvious situations.
Algolia’s libraries automatically start searching the database from the very first letter that the user types in the search box. So if you’re using InstantSearch, you get this functionality out-of-the-box. But if you’re rolling your own UI, you might be thinking that this is a waste of time to implement, or a waste of HTTP requests. There have been some creative ways of going about it (one of the most interesting I’ve seen is setting a timer after each new letter typed, and only sending the search request if the user hasn’t typed anything new after some fraction of a second), but as “elegant” as those solutions might seem, the data just doesn’t back that approach. We’ve tested it many times, and it’s become clear that any heuristic that launches the query after more than a single letter is typed leads to poor user experience and suboptimal conversion rates. It’s worth the cost of the HTTP requests.
Natural languages have a ton of variety in them. Just think about how many different ways there are to make plurals in English! Here’s one article with 8 helpful rules for plural nouns: rule 7 is that some words are already singular and plural (think sheep), and rule 8 is that there are actually no rules! So how can we match queries to results that use different forms of the same words? There are a lot of techniques out there that factor into the algorithm, like stemming, lemmatization, and phonetization, but Algolia’s algorithm handles this already for you. Rolling your own would really only be necessary if you’re working in a language that we don’t support (and we support 68 of the world’s most common languages as of July 2023, so that’d be a really niche usecase). To add onto that, rolling your own runs the risk of bungling situations where data in your index isn’t from the language you’d expect. Take last names, for example, which very often don’t line up with the spelling and grammatical rules you’d expect from the context language.
Takeaway: Unless you really know what you’re doing, let Algolia’s years of experience guide the best configuration for your search engine.
Answering this question starts by creating an index that’s mostly returning relevant results to everyone. If you’ve gotten to this point in the article and implemented the suggestions above, you’re already doing a good job thinking from a wide point of view.
But once you get to that point, is there still room to improve? Of course! You don’t have to settle for returning the set of results that would be most likely to contain what some average user is looking for. You can use Algolia’s personalization tools to return what that specific user is looking for! Imagine you’re a user shopping for grocery delivery online. Wouldn’t the user expect that after buying the same brand of milk several times, that brand would show up first in the search results for “milk”, regardless of that brand’s popularity with other customers? From the user’s point of view, that seems like common sense, but until recent years, that wasn’t that common in large-scale search implementations. Remember, users don’t think about relevance with such a wide point of view; they’re judging how accurate your search engine is, not by how close it gets to returning a result that most users would respond to, but by how close it gets to returning what that exact user is searching for. They likely have something specific in mind when they’re searching, and it’s Algolia’s personalization tools that can help you serve it to them. We go way deeper in this documentation guide, if you’d like to learn more about how you can implement it quickly.
Takeaway: Personalize the results to each individual user so that even niche queries get relevant results.
We’re strong believers in the ability of artificial intelligence’s power to augment and improve the human experience. So in that vein, we’ve created AI tools that can help you squeeze more relevance (and by extension, revenue) out of your search experience by spotting and making the most of patterns that would be near impossible to spot by eye:
Takeaway: Use Algolia’s AI-powered tools to get more out of your search index.
That’s it! Here’s a recap of our lessons learned today:
If you’ve got experiences to share on you’ve managed highly relevant search results, we’d love to hear about them! Drop us a line on Discord here.
Jaden Baptista
Freelance Writer at Authors CollectivePowered by Algolia AI Recommendations
Jaden Baptista
Technical WriterBen Franz
Sales Engineering and Product Leader at AlgoliaJulien Lemoine
Co-founder & former CTO at AlgoliaIvana Ivanovic
Senior Content Strategist