Icon relevance white

Typo-Tolerance

Last updated 02 October 2017

Typo-Tolerance Overview

Typo-tolerance is extremely important in modern search experiences for two reasons. First of all, with more and more usage occurring on mobile devices, typos are inevitable. Furthermore, for products with increasingly global userbases, not everyone will know the exact right way to spell a word — typo-tolerance is a powerful tool to support this growth.

Algolia provides robust typo-tolerance out-of-the-box, along with easy ways to customize just how tolerant a search experience should be.

How Typos Are Calculated (based on Damerau-Levenshtein Distance)

Algolia’s typo-tolerance algorithm is based on the Damerau–Levenshtein distance, which is the minimum number of operations (character additions, deletions, substitutions, or transpositions) required to change one word into another.

Keep In Mind

  • Typo-tolerance is not case-sensitive.
  • Accented letters and other special characters are ignored.
  • Because typos on the first letter are relatively uncommon, such typos are counted as two typos.

Below are a few examples of typo counts for various queries against a record containing the text “michael.”

michael  // 0 typos
mickael  // 1 typo (substitution: h → k)
micael   // 1 typo (deletion: h)
mickhael // 1 typo (addition: k)
micheal  // 1 typo (transposition: a ⇄ e)
Tichael  // 2 typos (substitution: T, first letter)
mickaell // 2 typos (substitution: h → k, addition: l)

Impact of Typos in Ranking Formula

Typo count is the very first criterion considered in Algolia’s default ranking formula. Because Algolia uses a tie-breaking algorithm to determine ranking, this means that records containing exact query matches are considered more relevant than any others, regardless of other ranking criteria.

We recommend the Ranking Formula’s default configuration for the vast majority of use cases.

Splitting & Concatenation

In addition to typos, Algolia handles splitting and concatenation — the insertion or removal of spaces or punctuation between two words. This way, “entert ainment” will match with “entertainment” and “jamesbrown” will match with “James Brown.”

These splits and concatenations are not considered typos, and they only work when there are no other typing mistakes.

Splitting and concatenation will only be performed when matching whole words, not prefixes. This means that “entert ainm” will not match “entertainment”, and that “jamesbro” will not match “james brown.”

Configuration Options

Every search experience and userbase is different; that’s why Algolia makes it easy to configure exactly how typos are handled.

Configure Word Length Necessary to Accept Typos

minWordSizefor1Typo (default: 4)

Typo-tolerance is only enabled once the query reaches a certain character length, defined in minWordSizefor1Typo. As soon as the word contains at least minWordSizefor1Typo characters, the engine will allow one typo in matches.

minWordSizefor2Typos (default: 8)

As soon as the query contains at least minWordSizefor2Typos characters, the engine will allow up to two typos in matches.

Enable or disable typo-tolerance

typoTolerance (default: true)

This setting can have 4 different values:

  • true: activate typo-tolerance

  • false: disable typo-tolerance

  • min: keep only results with the lowest number of typos. For example, if the smallest number of typos found is 0, then no results with typos at all will be returned. If the smallest number of typos found is 2, then no results with more than 2 typos will be returned.

  • strict: same as min, but + 1, meaning, keep the 2 smallest number of typos.

Setting typo-tolerance at index time

index.set_settings({
  typoTolerance: "strict"
});

Setting typo-tolerance at query time

index.search("query", {
  typoTolerance: "strict"
});

Typos are counted per-word. In the case of multi-word queries, it’s possible for each query word to have up to 2 typos, according to minWordSizefor1Typo and minWordSizefor2Typo.

When typoTolerance is set to strict, we force the Typo criterion to be first in the ranking formula.

When using a sort-by attribute, we recommend setting typo-tolerance to min to reduce the number of potentially irrelevant search results.

Consider Singular and Plural Forms Equivalent

ignorePlurals (default: false)

This feature is designed to match words written in the plural form if the query is in the singular form, and vice-versa. It’s built on a dictionary of singular and plural forms of words in over eighty languages. It works for simple plurals like hand ⇄ hands as well as more complex ones like feet ⇄ foot.

This parameter accepts a boolean or array value. We recommend passing an array of the specific ISO codes of languages you target. For example:

index.set_settings({
  ignorePlurals: ['en', 'fr']
});

Granular Targeting of Typo-Tolerance

allowTyposOnNumericTokens (default: true)

When enabled, allowTyposOnNumericTokens tells the engine to also allow typos on numeric attributes. It makes sense to disable it in specific situations such as postal codes — if typos are enabled, any postal code query will return a lot of false positive results.

disableTypoToleranceOnAttributes (default: [])

This parameter accepts an array of attributes for which typo-tolerance should be disabled. This is useful, for example, with products that might require SKU search without typo-tolerance.

disableTypoToleranceOnWords (default: [])

This parameter accepts an array of words for which typo-tolerance should be disabled. This is useful, for example, with acronyms like “mysql,” “php”, or “mamp.”

advancedSyntax (default: false)

This parameter provides a way to disable the typo-tolerance on only specific words in a query by using double quotes: foot problems would be typo-tolerant on both query words, while "foot" problems would only be typo-tolerant on problems. This usage also disables prefixing on words inside the double quotes.

Specify Your Own Alternative Corrections

altCorrections

When the default typo-tolerance is not enough, additional alternative corrections can be specified. Each alternative correction is described by an object containing three attributes:

  • word: the word to correct
  • correction: the corrected word
  • nbTypos: the number of typos (1 or 2) that will be considered for the ranking algorithm (1 typo is better than 2 typos)

For example, consider the following synonyms definitions:

[
  {
     "objectID": "a-unique-identifier",
     "type": "altCorrection1",
     "word": "foot",
     "corrections":[
        "feet"
     ]
  },
  {
     "objectID": "another-unique-identifier",
     "type": "altCorrection1",
     "word": "feet",
     "corrections":[
        "foot"
     ]
  }
]

With these two alternative corrections defined, the query “foot” will match with records containing “feet”, and vice versa, but this will be considered as 1 typo. Without these corrections, “foot” and feet” would have been considered synonymous, with zero typos, because of Algolia’s default plural handling.

advancedSyntax (default: false)

This parameter provides a way to disable the typo-tolerance on only specific words in a query by using double quotes: foot problems would be typo-tolerant on both query words, while "foot" problems would only be typo-tolerant on problems. This usage also disables prefixing on words inside the double quotes.

What’s Next

Continue building your Algolia knowledge with these concepts:

© Algolia - Privacy Policy