The term fuzzy search comes with several meanings, all of which turn on the idea of approximate matching.
The most common understanding of the term involves fuzzy matching, where a search engine matches words that do not match exactly. The classic example is misspellings, where incorrectly spelled queries turn up correctly spelled results.
For example, it could also correct poorly formulated queries, recognize colloquial vocabulary, expand prefixes, or build loose category relationships between a query and the content being searched.
But there’s more to fuzziness than correcting typos, as you’ll see. So buckle up as we de-fuzz search to show you how it works, its business and consumer benefits, and a couple of success stories.
What is Fuzzy Search?
In essence, fuzzy search — also known as approximate string matching — is a method that seeks an approximate match rather than an exact one, allowing for a more flexible and forgiving search experience.
Fuzzy search operates on a human-centric approach. Rather than a binary true or false, it introduces a spectrum of similarity, evaluating how closely what you typed matches what you’re searching for.
Unlike exact string matching, fuzzy search excels at accommodating typos and misspellings from clumsy fingers, hurried inputs, and the tiny buttons on mobile devices. It can even handle variations of spelling across different languages.
Fuzzy search also proves invaluable in processing user-generated data, which is notorious for its inconsistencies (like misspellings, alternative spellings, and localized variations). Depending on the implementation, it can even extend to phonetic and sound-based matching, giving you a robust and comprehensive search.
Fuzzy matching extends the concept of fuzzy search by identifying information based on similarities, not just an exact match. This broad term encompasses various techniques, so let’s zero in on language-based similarities:
Synonyms: When a user searches for “football” from outside of America, the search engine can also return America-based results for “soccer”. Advanced search engines dynamically create synonym lists based on current events or trends.
Grammatical variations: “Run”, “running”, and “ran” are all forms of the verb “to run”. Search engines use stemming and lemmatization techniques to reduce words to their base or root form, enabling comprehensive search results.
Dictionary-based techniques: Specialized dictionaries for various fields give us precise matching for technical terms, enhancing search accuracy and relevance. For example, a medical dictionary can help us match “hypertension” with “high blood pressure”.
Heuristic and NLP (Natural Language Processing) methods: NLP algorithms can break down text into individual words or tokens, identify the grammatical role of each word, recognize the names of people, places, and organizations, and generate context-based synonyms with machine learning. For example, they could tell that “COVID” and “coronavirus” should be treated as interchangeable during the recent pandemic.
Partial word matching: Typing “app” might return results for “apple”, “application”, and “appetite”. This is particularly useful for search-as-you-type features, where the search engine provides suggestions based on partial input. Techniques like prefix matching and n-grams (breaking words into smaller chunks) are commonly used here.
Phrase matching: Searching for “machine learning algorithms” should return results about “algorithms used in machine learning”. This is achieved through phrase indexing and n-gram analysis, which let the search engine understand and match longer phrases within the content.
Query matching: Typing “how to bake a cake” should return results about cake recipes and baking tips, not just pages containing the exact phrase. This involves query rewriting, contextual analysis, and user behavior modeling.
How Does It Work?
The starting point for the search is textual matching, which matches a query’s exact characters, letters, and words to the records in the dataset. This boolean logic delivers binary results: the search string either matches or it doesn’t. This can be powerful for precise searches but may miss relevant results if the input isn’t perfect.
For instance, if you search for “boat”, the search engine will return all records containing the word “boat”. Since that search term doesn’t match the query, it won’t return “canoe” or “cruise ship” results. Another example: imagine someone who is trying to search for “canoe” but misspells it as “canou”. Without typo tolerance, the search engine would fail to return any results.
Fuzzy search algorithms are designed to handle input errors and variations, ensuring that users still receive relevant results even if their queries are imperfect.
Here’s a breakdown of several fuzzy search applications:
Word Size Threshold for Typos
Adjusting the word size threshold for typos (insertions, deletions, or substitutions) helps balance relevance and tolerance. Typically, shorter words can tolerate fewer typos than longer words.
Example:
One Typo: Activated for words with at least 4 characters. Typing “helo” might return “hello”.
Two Typos: Activated for words with at least 8 characters. Typing “acomodation” might return “accommodation”.
Singular and Plural Words
Fuzzy search can be configured to treat words like “foot” and “feet” as equivalent, making sure the engine returns comprehensive search results since this kind of difference is unlikely to make a search result irrelevant.
Example:
Searching for “cat” will also return results for “cats”.
Searching for “child” will return results for “children”.
Typo Tolerance with Numbers
Handling numbers correctly in fuzzy search is important for contexts like phone numbers and postal codes.
Typo tolerance can be enabled or disabled for numeric tokens. This ensures accurate search results for specific use cases like product codes or contact information.
Example:
Enabled: Typing “123-456-789” might still match “123-456-7890”.
Disabled: Typing “12345” will only match exact matches, ensuring accuracy for postal codes.
Specific Attributes and Words
Fine-tuning typo tolerance for specific attributes or words can enhance the accuracy of critical data.
To ensure exact matches, certain attributes (like product SKUs) or precise terms (like acronyms) can have typo tolerance disabled.
Fine-tuning typo tolerance for specific attributes or words can enhance the accuracy of critical data.
Example:
Product SKUs: Searching for “SKU1234” will only match the exact SKU, not “SKU12345”.
Acronyms: Searching for “NASA” will not match “NASAA”.
A fuzzy matching algorithm is designed to find approximate matches rather than exact ones, allowing for variations in the input data. These algorithms enhance search capabilities, especially in handling user errors, variations, and complex queries.
Let’s take a look at the most used algorithms:
Levenshtein Distance (Edit Distance)
How it works: Measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into another.
Use case: Useful for correcting spelling errors and typos. For example, changing “kitten” to “sitting” requires three edits: replace “k” with “s”, replace “e” with “i”, and add “g” at the end. The more changes needed, the less similar the words are.
Damerau-Levenshtein Distance
How it works: Considers transpositions (swapping adjacent letters); this algorithm is similar to Levenshtein in that it takes the position of letters into account to determine the similarity between the input and the query.You’ve likely encountered it when typing hastily and accidentally switching two letters, like “teh” instead of “the”. Damerau-Levenshtein would count this as one change instead of two.
Use case: Handles common typing errors such as “actress” instead of “caress”.
Jaro-Winkler Distance
How it works: Measures similarity between two strings, giving more weight to strings that match from the beginning. It’s an improvement over the Jaro distance by incorporating a prefix scale, so it assigns greater importance to first impressions. Take “Martha” and “Bertha” as an example; they’d be deemed more alike than “Martha” and “Martin”, despite both pairs differing by the same amount.
Use case: Effective for short strings such as names, where the start of the string is more likely to be correct.
Soundex Algorithm
How it works: Encodes words by their sound, converting them into a four-character code. Words that sound similar but are spelled differently get the same code.Soundex is a tool for matching names with similar pronunciations, regardless of spelling variations. Soundex is likely to recognize that “Smith” and “Smythe” likely refer to the same surname despite their different spellings.
Use case: Used for matching words that sound alike, particularly useful in name matching and genealogical research.
Metaphone and Double Metaphone
How it works: Encodes words based on their pronunciation and generates two codes for ambiguous pronunciations. This advanced phonemic algorithm functions as an advanced, more accurate iteration of Soundex. Think of them as language interpreters specializing in pronunciation.Their goal is to match words based on phonetic similarity rather than spelling alone. For instance, they’d recognize the phonetic equivalence of “photo” and “foto”, or the likelihood that “Kathy” and “Cathy” represent the same name.
Use case: Used for matching words based on their sound, especially in multilingual contexts. Examples of rules include dropping silent letters and converting letters based on their position and surrounding context (e.g., “ph” -> “f”, “kn” -> “n”)
Cosine Similarity
How it works: Measures the cosine of the angle between two non-zero vectors of an inner product space. This metric is useful for comparing the similarity of documents.Let’s say you’re a librarian trying to organize books by their topics. Each book can be about multiple subjects, like “cooking”, “Italian culture”, and “travel”. Now, let’s say you want to find out how similar two books are. If the vector (think a compass direction) that represents the book’s content points one way, the books are very similar. If it points the other way, they’re completely different.
Use case: Used in document comparison, information retrieval, and text analysis.
N-gram Similarity
How it works: Splits strings into contiguous sequences of n characters (n-grams) and compares these sequences between two strings. You break down each phrase into these 2-letter pieces and compare them to see how similar they are. For example, “I love pasta” would be pretty similar to “I love pizza”. It’s like playing a matching game where you don’t need all the pieces to be the same to see that two puzzles are similar. This method is great for catching small differences or typos. For example, it would recognize that “color” and “colour” are very similar, even though they’re spelled differently.
Use case: Effective for spell-checking, finding similar words in different languages, and matching names that might be spelled slightly differently
Enhancing User Experience and Business Performance With Fuzzy Search
Fuzzy search offers multiple advantages that improve user experience and the visitor’s experience. By intelligently handling variations in search queries, fuzzy search bridges the gap between user intent and available information, leading to more satisfying interactions and improved efficiency.
Let’s explore the specific benefits that fuzzy search brings to consumers and businesses:
Benefits of Fuzzy Search for Consumers
Handles Typos and Misspellings: Corrects typing errors and misspellings, ensuring consumers find relevant results.
Provides Better Experience: Creates a forgiving and intuitive search experience without perfect input.
Matches Similar Terms: Understands and matches synonyms, allowing natural use of language.
Matches Prefixes and Infixes: Lets you find results by typing only part of a word.
Provides Sound Based Results: Matches names and brand searches that could have weird or ambiguous spelling.
Benefits of Fuzzy Search for Businesses
Captures Missed Opportunities: Ensures potential sales aren’t lost due to input errors.
Enhances User Experience: Leads to higher satisfaction and retention since users find what they need easily.
Unveils Customer Intent: Provides insights into customer behavior and search patterns.
Manages Variations: Handles inconsistencies in user-generated content.
Simplifies Search: Reduces the need for support since users just know intuitively how to find what they want.
Makes Products Discoverable: Improves inventory turnover because users can easily get to the product they’re most likely to buy.
Increases Engagement: Leads to longer sessions and higher conversion rates.
Fuzzy Search In The Wild: Real-World Examples
Intrend: Increasing Average Order Value by 19% at Intrend with agile search
Intrend is one of the most influential Italian factory outlets in the women’s clothing market. It combines a wide range of products you would typically find in an outlet with the quality of service provided by the most prestigious ready-to-wear clothing stores.
Intrend’s strategy and implementation of Algolia, including fuzzy search features such as typo tolerance and synonym matching, delivered positive results on the shopper experience. The proportion of users using Intrend’s search jumped from 8% to 10%, thanks to the enhanced search capabilities that accounted for minor errors and variations in user queries.
This improved experience has a direct business impact on Intrend, as search users represent an important source of their online revenue.
Decathlon Singapore: Driving a 50% higher conversion rate with omnichannel, personalized search
For 40 years, Decathlon has delivered the best value in the retail sports industry by offering high-quality, sustainable, and cost-effective products. Today, Decathlon has 90,000 employees and over 1,500 stores in more than 50 countries, including Singapore.
After identifying that many shoppers bounced when they saw products out of stock, Decathlon used Algolia’s custom ranking to stop displaying those products. This increased their conversion rate by 0.6 points.
By using Algolia’s Visual Editor to optimally merchandise search results, Decathlon’s teams saw a “no result rate” drop from 5% to 1.8%. Then, after implementing search personalization, which includes fuzzy search capabilities like handling typos and understanding user intent, Decathlon’s team saw a 36% increase in click-through rate and a 50% increase in conversion rate.
Final Words: The Dangers of Getting Too Many Unexpected Results
The best fuzzy search systems are like having a mind-reading assistant. You might type “grapes rath jhon stenbeck”, and they’ll know exactly what book you want, even with the typos and missing words. They’re working behind the scenes to understand what you mean, not just what you type, so you never get weird results that seem to break the relevance of the results.
With so many people looking for products, goods, and services online — and on increasingly smaller devices — enabling search engines to understand and retrieve relevant information even when user queries are not perfect is more important than ever.
Thanks to search-as-a-service platforms like Algolia, fuzzy search has become user-friendly, affordable, and reliable. If you’d like to get your own personal, free demo, just click the chat button at the bottom right and ask for one; we’ll set something up right away.