Algolia’s Ranking Strategy
Algolia’s ranking strategy leverages our tie-breaking algorithm, handling challenging search problems such as typos, geo located results, exact matches and more, in combination with both your data’s textual and business relevant attributes.
Using our proprietary algorithm in combination with your data attributes, Algolia creates a unique relevancy and ranking that returns relevant results from the first keystroke of a user’s query.
Most search engines use a coefficient-based approach and rank results based on a unique float value that is hard, if not impossible, to decipher.
To improve relevance, Algolia built a tie-breaking algorithm, here’s how it works:
- All the matching records are sorted according to the first criterion.
- If any records are tied, those records are then sorted according to the second criterion.
- If there are still records that are tied, those are then sorted according to the third criterion and so on, until each record in the search results has a distinct position.
The default order of ranking criteria
Algolia pre-defines the order of its criteria. For example, Typos comes first, Geolocation is next, Exact word-matching is last. There are 8 in all.
We recommend using this out-of-the-box ranking order as it works well for the vast majority of use cases. You can, however, modify the order of rules if necessary.
These 8 ranking criterion of our tie-breaking algorithm help us define what is both textually and business relevant.
Algolia can retrieve the records searched by the user even if a typing mistake was made. By default, we’ll match words that have 0, 1 or 2 typos per word. This is called typo-tolerance.
The Typo criterion in the ranking formula makes sure that a record without typos will be ranked higher than one with 1 typo, themselves being ranked higher than ones with 2 typos, and so on.
Geo (if applicable)
If you’re using the geo-search feature of our engine, we will rank results by distance, from the closest to the farthest.
The precision of this ranking is set by the parameter aroundPrecision. For example, with
aroundPrecision=100, two results up to 100 meters apart will be considered equal.
Words (if applicable)
This criterion is only applicable if you are using the optionalWords setting.
By default, Algolia discards all results that don’t contain all the words of the query. But with
optionalWords, where you declare some words as optional, the Words criterion will rank them by the number of words typed by the user that matched. Keep in mind that this is not counting the number of times the word appears in the record, but rather counting the number of words typed by the user that matched.
For example, if the user typed 2 words, the maximal score for this criterion is 2 - even if a record contains this word 10 times.
If a query has used filters or optional filters, the filters criterion will rank records according to a filtering score. All filters default to a score of 1 - so, records that match a single filter will have a score of 1 and will therefore score higher than records that do not match any filter (1 > 0). Equally, records that match more than one filter will score higher than records with less matches - because Algolia counts each match.
For purposes of tie-breaking, all records with the same score are ranked the same, and so the ranking formula will drop to the next criterion to break the tie.
You can adjust the scoring in 2 significant ways:
- With filter scoring, you can use variable scores, scoring some filters higher than 1. By setting a filter with a score = 2, or score=3, you can favor that filter over others.
- With sumOrFiltersScores, you can accumulate the scores of disjunctive (OR) matches to come up with a total score, ranking records higher than records with a lesser total score.
The Filter criterion can be quite powerful in defining relevance, as seen in the personalization example.
For a query that contains two or more words, Proximity calculates how physically near those words are to each other in the matching record. This criterion will rank higher the objects that have the words closer to each other.
George Clooney is a better proximity match than
George word Clooney.
The Attribute criterion only considers attributes you have placed in the searchableAttributes (also referred to as AttributesToIndex). Additionally, attributes at the top of the
searchableAttribute list rank higher than lower ones.
There is also an importance to the order of the matches within the attribute itself. If you have selected
ordered, records whose matched words are closer to the beginning of a given attribute will be ranked higher. For example, words in position 2 of an attribute are ranked higher than words in position 5. If not
ordered, the position of the word is not taken into account.
Records with words (not just prefixes) that exactly match the query terms are ranked higher.
This criterion takes into account the settings that you have selected using Custom Ranking.
If you have multiple attributes in your Custom Ranking, the behavior will be the same as for the rest of the Ranking Formula: we’ll only look at a criterion to refine the ranking when there is a tie on all the previous criteria.
For example, if you have the following Custom Ranking:
featured being either
number_of_likes being a numerical value, then the tie-breaker for objects with the same ranking after the 6 first criteria will be as follows:
- Featured objects, ranked from the most to the least liked
- Not featured objects, ranked from the most to the least liked
For the most part, our ranking formula follows the rules outlined above to score records. However, some combinations of criteria can score differently depending on their relative position to each other.
We will describe one such combination - attribute and proximity.
Attribute and Proximity
When proximity appears before attribute in the ranking, the calculation of the attribute ranking will be different than if proximity had appeared after attribute. We call this duality the best-matched attribute. For tie-breaking purposes, the ranking formula looks to the best-matched attribute.
Computing the “best-matched attribute”
As you will see below, Algolia uses two computation methods: closest in proximity and best position. The closest in proximity method refers to scoring based on how close 2 or more query terms are to each other. Best position considers words near the beginning of an attribute better than those towards the end.
As seen in the ordering of the 8 criteria, the default ranking formula puts proximity before attribute, which has a subtle but important affect on computing the best-matched attribute: attributes whose matched terms are closest in proximity to each other will be ranked highest.
On the other hand, if you change the default by putting proximity after attribute, or removing proximity altogether, the best-matched attributes will be those whose matched terms are in the best position.
Let’s look at an example. Imagine an index with 2 searchable attributes - profession, full-name.
If we use the default - proximity before attribute, which uses the “closest in proximity” rule - then the attribute that contains the 2 words “jerry” and “singer” in closest proximity will be ranked higher. So here, full-name will be ranked higher than profession.
On the other hand, if you put proximity after attribute, the ranking will now be based on the “best position” of the matched terms within the searchable attributes (profession, full-name). Consequently, with the query “jerry singer”, the term “singer” shows up in profession before full-name:
Subtle. We recommend keeping the proximity criterion before the attribute criterion. Proximity usually leads to a better identification of the best-matched attribute.
Did you find this page helpful?
We're always looking for advice to help improve our documentation!
Please let us know what's working (or what's not!).
We're constantly iterating thanks to the feedback we receive.