Typo-Tolerance

Last updated 29 March 2017
Table of contents

Introduction

Algolia’s typo-tolerance algorithm is unique in many ways. First, it is based on different rules than most search engines. But the main difference comes from the fact that typos are natively taken into account in the ranking of the results.

How typos are calculated

Algolia’s typo-tolerance algorithm is based on the Damerau–Levenshtein distance, which provides more relevant results in an as-you-type search experience than more traditional techniques like tokenization, lemmatization or stemming.

When we compare two words, we count as one typo every time a character is missing, superfluous, substituted, or if two characters are transposed. Since it’s very rare that typing mistakes happen on the first letter, we’ll count two typos if the first letter is involved.

// If the word in the object is "Michael"
michael  // 0 typos
mickael  // 1 typo (substituted letter)
micael   // 1 typo (missing letter)
mickhael // 1 typo (added letter)
micheal  // 1 typo (transposed letters)
Tichael  // 2 typos (substituted letter concerning the first character)
mickaell // 2 typos (one substituted letter, and one added letter)

Uppercase/lowercase, accents and other special characters are always ignored, and never counted as a typo.

Impact of typos in the Ranking Formula

Typos are taken into account in the Typo criterion of the Ranking Formula. By default, this criterion is at the first position of the Formula, which makes it the most impactful on the ranking.

Typo tolerance 1

The default Ranking Formula

If typo is in first position of the Ranking Formula, it means that you’ll never see an object with more typos ranked higher than an object with lower typos. We highly recommend to keep this default configuration, which works perfectly in a vast majority of use-cases.

Split / Concatenation

In addition to typos, Algolia handles the split and concatenation of words when a space (or a punctuation character) was inserted in a word or removed between two words.

This way, entert ainment will match with entertainment and jamesbrown will match with James Brown.

Split & Concatenation don’t count as a typo, and work only when there isn’t another typing mistake.

Split & Concatenation will only be performed when matching whole words, not prefixes. This means that entert ainm will not match entertainment, and that jamesbro will not match james brown.

Basic settings

minWordSizefor1Typo

The typo-tolerance isn’t active on all queries. It is only enabled when the considered word reaches a certain size defined in the setting minWordSizefor1Typo (default: 4). As soon as the word contains at least minWordSizefor1Typo characters, we’ll authorize one typo.

minWordSizefor2Typos

As soon as the word considered contains at least minWordSizefor2Typos characters (default: 8), we’ll authorize two typos.

typoTolerance

This setting can have 4 different values:

  • true: activate the typo-tolerance (default value).
  • false: totally disable the typo-tolerance.
  • min: keep only results with the lowest number of typo. For example if one result has 0 typos, then all results with typos will be hidden.
  • strict: if there is a match without typo, then all results with 2 typos or more will be removed.

Note: The number of typos are counted per word. In the case of multiple words search query, it’s possible for each query word to have up to 2 typos each according to minWordSizefor1Typo and minWordSizefor2Typo configuration.

Note: When typoTolerance is set to strict, we force the Typo criterion to be first in the ranking formula.

ignorePlurals

This feature is designed to match words written in the plural form if the query is in the singular form, and vice-versa. It is built on top of a dictionary of singular and plural forms of words in 80+ languages. You can disable or enable it if you want. It works for simple plurals (hand, hands) and complex ones (feet, foot).

Advanced settings

allowTyposOnNumericTokens

If this setting is enabled, our typo-tolerance algorithm will also apply on numbers. You may want to disable it in situations like a postal code in an address: if typos are enabled, typing the postal code will match with a lot of false positive results.

disableTypoToleranceOnAttributes

You can declare a list of attributes for which you want to disable the typoTolerance. This is for example useful if you want to search on SKUs while disabling typos only on this attribute.

disableTypoToleranceOnWords

You can declare a list of words for which you want to disable the typoTolerance. This is for example useful for acronyms like mysql, php…

altCorrections

When the default typoTolerance matching is not enough, you may want to specify additional alternative corrections. Each alternative correction is described by an object containing three attributes:

  • word: the word to correct
  • correction: the corrected word
  • nbTypos: the number of typos (1 or 2) that will be considered for the ranking algorithm (1 typo is better than 2 typos)

For example, if you set "altCorrections": [ { "word" : "foot", "correction": "feet", "nbTypos": 1}, { "word": "feet", "correction": "foot", "nbTypos": 1}], the query “foot” will match with records containing “feet”, but this will be considered as 1 typo.

advancedSyntax (enables the double-quote notation for exact-matching)

There is a way to disable the typo-tolerance on only certain words of the query. If you enable the advanced syntax, you can use double quotes " in your query to disable the typo-tolerance between the double quotes.

For example, the query "word1 word2" word3 would match objects that contain exactly the expression “word1 word2”, without typo tolerance, but the typo-tolerance would be applied on word3. This would also disable the prefixing on the words inside the double quotes.

Did you find this page helpful?

We're always looking for advice to help improve our documentation! Please let us know what's working (or what's not!) - we're constantly iterating thanks to the feedback we receive.

Send us your suggestions!