On this page
Typos and Spelling Errors
What is a typo?
- A missing letter in a word, “hllo” → “hello”
- An extraneous letter, “heello” → “hello”
- Inverted letters: “hlelo” → “hello”
- Substituted letter: “heilo” → “hello”
Other spelling errors
We do not count extra or missing spaces and punctuation as typos, but we only handle them if
typoTolerance is enabled (set to
strict). For example:
- A missing space between two words, that we handle with splitting: “helloworld” → “hello world”
- An extraneous space or punctuation, that we handle with concatenation: “hel lo” → “hello”
What is Typo Tolerance?
Typo tolerance allows users to make mistakes while typing and still find the records they are looking for. This is done by matching words that are close in spelling.
Tolerating typos is extremely important in modern search experiences for two reasons. First of all, typos are inevitable on mobile devices. Secondly, because products and services are growing in both complexity and global reach, not everyone knows the right way to spell a word.
Algolia provides typo tolerance out-of-the-box, along with some important ways to customize just how tolerant a search experience should be. This is what we’ll discuss on this page.
How Typos Are Calculated
Algolia’s typo tolerance algorithm is based on distance. Distance refers to the difference in spelling between a typed word and its exact match in the index. A perfect match is distance = 0. When there is a perfect match, or the distance is low (one or two letters mistakenly typed), then a match is made, and the record is added to the results.
For example, if the engine receives a word like “strm”, this can mean “storm” or “strum” (distance=1 / one letter incorrect), or “star” or “warm” (distance=2 / two letters incorrect).
Distance essentially establishes a threshold of tolerance. The threshold is at 2: when a word is distant by 3 or more mistakes, it will not be tolerated (i.e., not included in the results).
Distance involves a precise logic: it is the minimum number of operations (character additions, deletions, substitutions, or transpositions) required to change one word into another. This is known as the Damerau–Levenshtein distance.
Below are a few examples of how Algolia counts the operations needed to transform a word.
1 2 3 4 5 6 7 8 michael // 0 typos mickael // 1 typo (substitution: h → k) micael // 1 typo (deletion: h) mickhael // 1 typo (addition: k) micheal // 1 typo (transposition: a ⇄ e) mickaell // 2 typos (substitution: h → k, addition: l) Tichael // 2 typos (substitution: m → T, first letter) Tickael // 3 typos (substitution: m → T, first letter, substitution h → k)
- We are transforming the words of a query not to correct their spelling. We are doing this, instead, as part of the search itself: for example, if a word is found as-typed (that is, with no transformation), its typo count = 0. If, on the other hand, the engine finds the word only after applying one or more operations, then each operation costs 1. Ultimately, the typo count impacts ranking.
- The last two examples (“Tichael” and “Tickael”) represent an exception to the distance=2 threshold. Because typos on the first letter are relatively uncommon, such typos are counted as 2, and in this case only, the threshold is distance=3.
- Typo tolerance is not case-sensitive.
- Accented letters and other special characters are ignored.
- Typo tolerance only works with phonemic languages, which use single characters to represent sound to form a word. Spelling errors are possible for these languages. Logogram-based languages (such as Chinese, Japanese, Korean, and Vietnamese) do not use single letters to represent sound—they rely on pictures to represent words (or partial words) instead. Therefore, typo-tolerance doesn’t work for these languages.
- In addition to typos, Algolia handles splitting and concatenation-the insertion or removal of spaces or punctuation between two words. This way, “entert ainment” matches with “entertainment”, and “jamesbrown” matches with “James Brown.” We do not consider splits and concatenations as typos, but we only handle them if
typoToleranceis enabled (set to
strict), and when there are no other typing mistakes.
- When a user places words within quotes, and you have set the
advancedSyntaxsetting to true, then typo tolerance will not apply to the text within the quotes. Quotation marks require exact matching.
Impact of Typos in Ranking Formula
Typo count is the very first criterion in Algolia’s default ranking formula. Therefore, words with perfect matches are ranked higher than words with 1 typo, and 1 typo words are ranked higher than words with 2 typos.
Configuring Typo Tolerance
Every search experience and user base is different. That’s why Algolia makes it easy to configure exactly how typos are handled.
For alphabet-based and phonemic languages (English, French, Russian, ..), we offer many ways to configure the engine to improve typo tolerance. Usually, just enabling it is sufficient. However, for some queries or data sets, you’ll need to fine-tune its settings.