Guides / Managing results / Optimize search results

Typos and spelling errors

What’s a typo?

  • A missing letter in a word, “hllo” → “hello”
  • An extraneous letter, “heello” → “hello”
  • Inverted letters: “hlelo” → “hello”
  • Substituted letter: “heilo” → “hello”

Other spelling errors

Extra or missing spaces and punctuation don’t count as typos. Algolia handles them if typoTolerance is enabled (set to true, min or strict). For example:

  • A missing space between two words is split: “helloworld” → “hello world”
  • An extraneous space or punctuation is concatenated: “hel lo” → “hello”

What’s typo tolerance?

Typo tolerance allows users to make mistakes while typing and still find the records they’re looking for. This is done by matching words that are close in spelling.

Tolerating typos is extremely important in modern search experiences for two reasons. First of all, typos are inevitable on mobile devices. Secondly, because products and services are growing in both complexity and global reach, not everyone knows the right way to spell a word.

Algolia provides typo tolerance out-of-the-box, along with some important ways to customize just how tolerant a search experience should be.

How typos are calculated

Algolia’s typo tolerance algorithm is based on distance. Distance refers to the difference in spelling between a typed word and its exact match in the index. A perfect match is distance = 0. When there is a perfect match, or the distance is low (one or two letters mistakenly typed), then a match is made, and the record is added to the results.

For example, if the engine receives a word like “strm”, this can mean “storm” or “strum” (distance=1 / one letter incorrect), or “star” or “warm” (distance=2 / two letters incorrect).

Distance essentially establishes a threshold of tolerance. The threshold is at 2: when a word is distant by 3 or more mistakes, it’s ‘not tolerated’ (not included in the results).

Calculating distance

Distance involves a precise logic: it’s the minimum number of operations (character additions, deletions, substitutions, or transpositions) required to change one word into another. This is known as the Damerau–Levenshtein distance.

Below are a few examples of how Algolia counts the operations needed to transform a word.

1
2
3
4
5
6
7
8
michael  // 0 typos
mickael  // 1 typo (substitution: h → k)
micael   // 1 typo (deletion: h)
mickhael // 1 typo (addition: k)
micheal  // 1 typo (transposition: a ⇄ e)
mickaell // 2 typos (substitution: h → k, addition: l)
Tichael  // 2 typos (substitution: m → T, first letter)
Tickael  // 3 typos (substitution: m → T, first letter, substitution h → k)

Important considerations

  • You’re transforming the words of a query not to correct their spelling. This is part of the search itself: for example, if a word is found as-typed (that is, with no transformation), its typo count = 0. If, on the other hand, the engine finds the word only after applying one or more operations, then each operation costs 1. Ultimately, the typo count impacts ranking.
  • The last two examples (“Tichael” and “Tickael”) represent an exception to the distance=2 threshold. Because typos on the first letter are relatively uncommon, such typos are counted as 2, and in this case only, the threshold is distance=3.
  • Typo tolerance isn’t case-sensitive.
  • Accented letters and other special characters are ignored.
  • Typo tolerance only works with phonemic languages, which use single characters to represent sound to form a word. Spelling errors are possible for these languages. Logogram-based languages (such as Chinese, Japanese, Korean, and Vietnamese) don’t use single letters to represent sound—they rely on pictures to represent words (or partial words) instead. Therefore, typo-tolerance doesn’t work for these languages.
  • In addition to typos, Algolia handles splitting and concatenation-the insertion or removal of spaces or punctuation between two words. This way, “entert ainment” matches with “entertainment”, and “jamesbrown” matches with “James Brown.” Algolia doesn’t consider splits and concatenations as typos. They’re only handled if typoTolerance is enabled (set to true, min or strict), and when there are no other typing mistakes.
  • When a user places words within quotes, and you have set the advancedSyntax setting to true, then typo tolerance doesn’t apply to the text within the quotes. Quotation marks require exact matching.

Impact of typos in the ranking formula

Typo count is the very first criterion in Algolia’s default ranking formula. Therefore, words with perfect matches are ranked higher than words with 1 typo, and 1 typo words are ranked higher than words with 2 typos.

Configuring typo tolerance

Every search experience and user base is different. That’s why Algolia makes it easy to configure exactly how typos are handled.

For alphabet-based and phonemic languages (such as English, French, and Russian), you can configure the engine in many ways to improve typo tolerance. Usually, it’s sufficient just to enable it. However, for some queries or data sets, you need to fine-tune its settings.

Did you find this page helpful?