Concepts / Managing results / Typo tolerance
Jan. 07, 2019

Typo tolerance

Typos and Spelling Errors

What is a typo?

  • A missing letter in a word, “hllo” → “hello”
  • An extraneous letter, “heello” → “hello”
  • Inverted letters: “hlelo” → “hello”

What is Typo Tolerance?

Typo tolerance allows users to make mistakes while typing and still find the records they are looking for. This is done by matching words that are close in spelling.

Tolerating typos is extremely important in modern search experiences for two reasons. First of all, typos are inevitable on mobile devices. Secondly, because products and services are growing in both complexity and global reach, not everyone knows the right way to spell a word.

Algolia provides typo tolerance out-of-the-box, along with some important ways to customize just how tolerant a search experience should be. This is what we’ll discuss on this page.

How Typos Are Calculated

Algolia’s typo tolerance algorithm is based on distance. Distance refers to the difference in spelling between a typed word and its exact match in the index. A perfect match is distance = 0. When there is a perfect match, or the distance is low (one or two letters mistakenly typed), then a match is made, and the record is added to the results.

For example, if the engine receives a word like “strm”, this can mean “storm” or “strum” (distance=1 / one letter incorrect), or “star” or “warm” (distance=2 / two letters incorrect).

Distance essentially establishes a threshold of tolerance. The threshold is at 2: when a word is distant by 3 or more mistakes, it will not be tolerated (i.e., not included in the results).

Calculating Distance

Distance involves a precise logic: it is the minimum number of operations (character additions, deletions, substitutions, or transpositions) required to change one word into another. This is known as the Damerau–Levenshtein distance.

Below are a few examples of how Algolia counts the operations needed to transform a word.

1
2
3
4
5
6
7
8
michael  // 0 typos
mickael  // 1 typo (substitution: h → k)
micael   // 1 typo (deletion: h)
mickhael // 1 typo (addition: k)
micheal  // 1 typo (transposition: a ⇄ e)
mickaell // 2 typos (substitution: h → k, addition: l)
Tichael  // 2 typos (substitution: m → T, first letter)
Tickael  // 3 typos (substitution: m → T, first letter, substitution h → k)

Important Considerations

  • We are transforming the words of a query not to correct their spelling. We are doing this, instead, as part of the search itself: for example, if a word is found as-typed (that is, with no transformation), its typo count = 0. If, on the other hand, the engine finds the word only after applying one or more operations, then each operation costs 1. Ultimately, the typo count impacts ranking.
  • The last two examples (“Tichael” and “Tickael”) represent an exception to the distance=2 threshold. Because typos on the first letter are relatively uncommon, such typos are counted as 2, and in this case only, the threshold is distance=3.
  • Typo tolerance is not case-sensitive.
  • Accented letters and other special characters are ignored.
  • Typo Tolerance only works with phonemic languages, which use single characters to represent sound to form a word. Spelling errors are possible for these languages. Logogram-based languages (e.g., Chinese and other Asian languages) do not use single letters to represent sound; instead, they rely on pictures to represent words (or partial words). Thus, typo logic is not used for these languages.
  • In addition to typos, Algolia handles splitting and concatenation - the insertion or removal of spaces or punctuation between two words. This way, “entert ainment” will match with “entertainment”, and “jamesbrown” will match with “James Brown.” These splits and concatenations are not considered typos, and they only work when there are no other typing mistakes.
  • When a user places words within quotes, and you have set the advancedSyntax setting to true, then typo tolerance will not apply to the text within the quotes. Quotation marks require exact matching.

Impact of Typos in Ranking Formula

Typo count is the very first criterion in Algolia’s default ranking formula. Therefore, words with perfect matches are ranked higher than words with 1 typo, and 1 typo words are ranked higher than words with 2 typos.

Configuring Typo Tolerance

Every search experience and user base is different. That’s why Algolia makes it easy to configure exactly how typos are handled.

For alphabet-based and phonemic languages (English, French, Russian, ..), we offer many ways to configure the engine to improve typo tolerance. Usually, just enabling it is sufficient. However, for some queries or data sets, you’ll need to fine-tune its settings.

Did you find this page helpful?