We use marketplaces every day. Whether looking up new apps for our phones and computers, or while doing a little shopping on Etsy or Amazon, we’re always looking for something.
By design, marketplaces are built to deal with millions of objects, thus making search the critical element of the shopper’s experience. Navigation through that massive amount of products should be made as easy and intuitive as possible.
But, search is complex. Relevant search is way more complex. On top of that, you also need to leverage the few dozens of words that describe your objects in order to always return first the most relevant results your demanding users are looking for.
Relevancy is more than just relevant VS irrelevant: it’s all that gray zone in between. Without an adequate search engine technology, search results are not always as relevant as you wish they would be.
How to implement the best marketplace search ever? We’re giving you the whole recipe and an open-source project to implement it.
The Psychology of Product Search
What your shoppers expect
Your end-users are used to the Google bar: they want to find what they are searching for whatever the way they write their queries and regardless the number of typing mistakes. They want to find what they are looking for on the first result page, in the top 3 results.
It turns out that searching that kind of non-structured (free text) content is not taken for granted. Most of the time, the objects your end-users and searching for are described by a 3-4 words title and a short description.
What your publishers/sellers dream of
On the other side of the fence, publishers and sellers will do everything they can to appear on the 1st search results page. Sometimes, they will even come close to the same kinds of SEO techniques spammers use on Google so as to crack their ranking algorithm.
Your search engine must be able to work around these hacks and must keep returning relevant results, regardless how the publishers name their products.
What You Need to Do
1. Deal with user-generated content
Even if most marketplaces have strict rules and guidelines on object titles & descriptions, you will need to deal with edge-case submissions that might respect the rules but that will trick your search algorithm with SEO hacks.
One trick in the book consists of injecting trendy keywords in the object title and/or description. For instance, adding “Facebook” to an object title would make the search engine retrieve that object every time the “facebook” word is queried, even if the underlying object has nothing to do with it.
The number of matches of a query word is also often taken into account. But what if a query word matched multiple times in the object description? Is that better than a single match? Well that’s typically something you want to have an answer to. Imagine all Apple accessories resellers who will rush on adding long lists of compatible devices “iphone, iphone 3G, iphone 4, iphone 4S, iphone 5, iphone 5S, iphone 5C, iphone 6, iphone 6” just to trigger a “very relevant” match on any “iphone” query… Probably not something you want, at least not before the actual iphones.
Algolia doesn’t give more importance to objects that are matching several times compared to records matching once. Instead:
- Algolia ranks the hits depending on the matching attribute weight (more important first),
- Algolia (optionally) ranks hits based on the position of the matching word in the attribute (considering the “iPhone 6” more important than “Leather case for iPhone 6” for the “iphone” query),
- Algolia considers all query words by default and fallback considering all words optional if there are not results. Hits matching more words are then ranked before others.
2. Embrace typos
Your users will do typing mistakes. A lot. And even more as the volume of mobile searches keeps on growing. But it makes the task far more challenging and complex to execute because it also involves find-as-you-type search that retrieves objects before the query is even completed.
Algolia natively supports typing mistakes and as-you-type searches. Hits having the less typos are ranked before others. The highlighting feature still works in order to help your users understand where the match occurred.
That said, when I search for “rihana” (misspelled, with a single ‘n’) I still want to see the popular “rihanna” objects first. That’s tricky, because it means that even if there are some objects matching the misspelled “rihana” you want to see the real “rihanna” first.
To implement such ranking strategy, your ranking formula must be able to consider the popular objects separately, and apply the actual sorting twice:
- display first the popular objects and compare them against each others to show the most relevant popular hit first,
- and then display the other objects and compare them against each others to show the most relevant non-popular hit first.
An eCommerce website could apply the same type of formula for discounted and featured products.
Algolia’s Ranking Algorithm Unveiled is the ideal solution for such ranking strategy: comparing one ranking criterion after another, moving to the next criterion if the results are tied.
To deal with such “uber popular” objects, you would tag your objects with a “popular” flag and inject it into Algolia’s ranking formula. Putting that flag as the most important ranking criterion will always retrieve “uber popular” hits first (whatever their number of typos) and the non-popular after.
3. Redefine Popularity
There may be several business metrics you want to use to refine your ranking algorithm. Those metrics probably include:
- the number of rating stars (out of 5),
- the number of reviews, downloads or installations
- the best selling products ($ales)
The more, the better. But, you eventually need to deal with the mathematical formula ruling them all \o/ Not mentioning that you need to also consider the text relevance as well.
So what if the average rating of a object is 5/5 based on 3 reviews; compared to another object which has an average of 4.3/5 based on 1000 reviews?
Algolia doesn’t combine your ranking criteria automatically because that’s really too business-specific: there isn’t any generic way to mix those business metrics.
In Algolia’s default ranking formula, the business metrics are only used to compare hits that are matching equally from a text-relevance point of view. The goal is first to display results that matches the user’s query words and then, if several matches are found: to sort them based on that business data.
If you’re dealing with ratings and number of reviews, you should give a look at the Bayesian Average.
Here is an implementation we’ve done that combines all those best practices. It’s a search of Wordpress plugins:
- 38K plugins indexed,
- Default typo-tolerance settings,
- Popular flag: set to “1” if downloaded at least 10K times,
- Business metric used to customize the ranking: the number of downloads of each plugin.
Building such experience literally took a few minutes!