21 Nov 2018


Typo-Tolerance Overview

Typo-tolerance is extremely important in modern search experiences for two reasons. First of all, with more and more usage occurring on mobile devices, typos are inevitable. Furthermore, for products with increasingly global userbases, not everyone will know the exact right way to spell a word — typo-tolerance is a powerful tool to support this growth.

Algolia provides robust typo-tolerance out-of-the-box, along with easy ways to customize just how tolerant a search experience should be.

How Typos Are Calculated

Algolia’s typo-tolerance algorithm is based on distance - the minimum number of operations (character additions, deletions, substitutions, or transpositions) required to change one word into another. This is known as the Damerau–Levenshtein distance.

Below are a few examples of how Algolia counts the operations needed to transform a word.

michael  // 0 typos
mickael  // 1 typo (substitution: h → k)
micael   // 1 typo (deletion: h)
mickhael // 1 typo (addition: k)
micheal  // 1 typo (transposition: a ⇄ e)
mickaell // 2 typos (substitution: h → k, addition: l)
Tichael  // 2 typos (substitution: m → T, first letter)
Tickael  // 3 typos (substitution: m → T, first letter, substitution h → k)

Keep In Mind

We are transforming the words of a query not to correct their spelling. We are doing this, instead, as part of the search itself: for example, if a word is found as-typed (that is, with no transformation), its typo count = 0. If, on the other hand, the engine finds the word only after applying one or more operations, then each operation costs 1. Ultimately, as described below, the typo count impacts ranking.

Additionally, keep in mind:

  • Typo-tolerance is not case-sensitive.
  • Accented letters and other special characters are ignored.
  • Because typos on the first letter are relatively uncommon, such typos are counted as two typos.

Impact of Typos in Ranking Formula

Typo count is the very first criterion considered in Algolia’s default ranking formula. Because Algolia uses a tie-breaking algorithm to determine ranking, this means that records containing exact query matches are considered more relevant than any others, regardless of other ranking criteria. See how typos impact ranking.

We recommend the Ranking Formula’s default configuration for the vast majority of use cases.

Splitting & Concatenation

In addition to typos, Algolia handles splitting and concatenation — the insertion or removal of spaces or punctuation between two words. This way, “entert ainment” will match with “entertainment”, and “jamesbrown” will match with “James Brown.”

These splits and concatenations are not considered typos, and they only work when there are no other typing mistakes.

Splitting and concatenation will only be performed when matching whole words, not prefixes. This means that “entert ainm” will not match “entertainment”, and that “jamesbro” will not match “james brown.”

Configuration Options

Every search experience and userbase is different; that’s why Algolia makes it easy to configure exactly how typos are handled.

Configure Word Length Necessary to Accept Typos

minWordSizefor1Typo (default: 4)

Typo-tolerance is only enabled once the query reaches a certain character length, defined in minWordSizefor1Typo. As soon as the word contains at least minWordSizefor1Typo characters, the engine will allow one typo in matches.

minWordSizefor2Typos (default: 8)

As soon as the query contains at least minWordSizefor2Typos characters, the engine will allow up to two typos in matches.

Enable or disable typo-tolerance

typoTolerance (default: true)

This setting can have 4 different values:

  • true: activate typo-tolerance

  • false: disable typo-tolerance

  • min: keep only results with the lowest number of typos.

  • strict: same as min, but + 1, meaning, keep the 2 smallest number of typos.

Setting typo-tolerance at index time

  typoTolerance: "strict"

Setting typo-tolerance at query time

index.search("query", {
  typoTolerance: "strict"

Typos are counted per-word. In the case of multi-word queries, it’s possible for each query word to have up to 2 typos, according to minWordSizefor1Typo and minWordSizefor2Typo.

When typoTolerance is set to strict, we force the Typo criterion to be first in the ranking formula.

When using a sort-by attribute, we recommend setting typo-tolerance to min to reduce the number of potentially irrelevant search results.

Consider Singular and Plural Forms Equivalent

By default, Algolia does not consider singulars and plurals as matches. You can override this default behaviour by setting ignorePlurals to true.

If activated, this feature is designed to match words written in the plural form if the query is in the singular form, and vice-versa. It’s built on a dictionary of singular and plural forms of words in over eighty languages. It works for simple plurals like hand ⇄ hands as well as more complex ones like feet ⇄ foot.

This parameter accepts a boolean or array value. We recommend passing an array of the specific ISO codes of languages you target. For example:

  ignorePlurals: ['en', 'fr']

Granular Targeting of Typo-Tolerance

allowTyposOnNumericTokens (default: true)

When enabled, allowTyposOnNumericTokens tells the engine to also allow typos on numeric attributes. It makes sense to disable it in specific situations such as postal codes — if typos are enabled, any postal code query will return a lot of false positive results.

disableTypoToleranceOnAttributes (default: [])

This parameter accepts an array of attributes for which typo-tolerance should be disabled. This is useful, for example, with products that might require SKU search without typo-tolerance.

disableTypoToleranceOnWords (default: [])

This parameter accepts an array of words for which typo-tolerance should be disabled. This is useful, for example, with acronyms like “mysql,” “php”, or “mamp.”

Specify Your Own Alternative Corrections

Alternative Corrections (altCorrection1, altCorrection2)

When the default typo-tolerance is not enough, additional alternative corrections can be specified. Each alternative correction is described by an object containing three attributes:

  • word: the word to correct
  • correction: the corrected word
  • type: There are 2 types: altCorrection1 and altCorrection2. You can use 1 or 2 to tell the engine the number of typos (1 or 2) that will be considered for the ranking algorithm (1 typo is better than 2 typos)

For example, consider the following synonyms definitions:

     "objectID": "a-unique-identifier",
     "type": "altCorrection1",
     "word": "foot",
     "objectID": "another-unique-identifier",
     "type": "altCorrection1",
     "word": "feet",

With these two alternative corrections defined, the query “foot” will match with records containing “feet”, and vice versa, but this will be considered as 1 typo. Without these corrections, “foot” and feet” would have been considered synonymous, with zero typos, if you had set ignorePlurals to true (where feet ⇄ foot).

Disabling typoTolerance on specific words

advancedSyntax (default: false)

This boolean parameter provides a way to disable the typo-tolerance on only specific words in a query by using double quotes: foot problems would be typo-tolerant on both query words, while "foot" problems would only be typo-tolerant on problems. This usage also disables prefixing on words inside the double quotes.

© Algolia - Privacy Policy