Easily integrate Algolia into native apps with FlutterFlow
Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...
Sr. Developer Relations Engineer
Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...
Sr. Developer Relations Engineer
In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...
Chief Executive Officer and Board Member at Algolia
When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...
Senior Digital Marketing Manager, SEO
Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...
Senior Digital Marketing Manager, SEO
“Hello, how can I help you today?” This has to be the most tired, but nevertheless tried-and-true ...
Search and Discovery writer
We are proud to announce that Algolia was named a leader in the IDC Marketscape in the Worldwide General-Purpose ...
VP Corporate Marketing
Twice a year, B2B Online brings together America’s leading manufacturers and distributors to uncover learnings and industry trends. This ...
Director, Sales Enablement & B2B Practice Leader
Generative AI and large language models (LLMs). These two cutting-edge AI technologies sound like totally different, incomparable things. One ...
Search and Discovery writer
ChatGPT, Bing, Bard, YouChat, DALL-E, Jasper…chances are good you’re leveraging some version of generative artificial intelligence on ...
Search and Discovery writer
Your users are spoiled. They’re used to Google’s refined and convenient search interface, so they have high expectations ...
Technical Writer
Imagine if, as your final exam for a computer science class, you had to create a real-world large language ...
Sr. SEO Web Digital Marketing Manager
What do you think of the OpenAI ChatGPT app and AI language models? There’s lots going on: GPT-3 ...
Search and Discovery writer
In the fast-paced and dynamic realm of digital merchandising, being reactive to customer trends has been the norm. In ...
Staff User Researcher
You’re at a dinner party when the conversation takes a computer-science-y turn. Have you tried ChatGPT? What ...
Sr. SEO Web Digital Marketing Manager
It’s the era of Big Data, and super-sized language models are the latest stars. When it comes to ...
Search and Discovery writer
Did you know that 86% of the global population uses a smartphone? The 7 billion devices connected to the Internet ...
Staff SME Business & Optimization - UI/UX
The Cloud Native Foundation is known for being the organization behind Kubernetes and many other Cloud Native tools. To foster ...
TL;DR Revamp your technical documentation search experience with DocSearch! Previously only available to open-source projects, we're excited ...
Senior Engineering Manager
Broadly speaking, a search index is like the index at the end of a book, where a small, non-exhaustive list of words and subjects are listed with page numbers. More precisely, it’s the mapping of a query to the content in a corpus (a large set of online books and documents, a product or film catalog). In computer-jargon, it’s an inverted list (index) of words that a search engine uses to find every word in every document within a corpus.
But is the metaphor of the book index actually correct? As in all matters related to technology, it’s hard to find a good balance between providing an overview of a subject and diving in deep – without losing meaning or your audience.
In the past, we’ve answered the question What is a search index? in different ways:
This article covers a middle ground between the functional and the technical, defining the capabilities of the powerful search indexes we often see in Google, Amazon, and Netflix, and providing an introduction to how these indexes can perform at such fast speeds.
A book index for a biography looks like this:
A search index can be represented in a very similar manner:
The book metaphor is useful because it underscores the general idea that an index is a separate object from the underlying content, which is used to (easily and quickly) find specific parts of the content (pages in a book, documents in a collection of documents)
To use another metaphor, an index helps us navigate a book like a compass to a map, where the compass replaces the need to scan the map. In the same way, an index at the end of a book is far more efficient than scanning the whole book for one phrase: it obviously saves you time and is more reliable. In the example above, the index directs you reliably to the exact sections in a biography that discuss the “early life” of the subject.
A metaphor only goes so far. The book metaphor doesn’t fully capture the capabilities, purposes, and mechanisms, nor our expectations, of a search engine index.
For example:
Let’s just say that the metaphor of a book index gets you in the door to understanding what an index does, but details like the above (and there are many more), help you understand the full potential of what a search engine index can accomplish and how it has transformed our lives.
A search index can be used in two different contexts:
Now, you can also search for books by tagging them with subjects, themes, authors, etc., but if the underlying goal of the search is to find content, the expectation is that every word and sentence in the book is searchable.
A successful object-based search (as we’ve defined it here) relies on a set of attributes that describe objects sufficiently so that a searcher can find what they are looking for using a reasonably small set of well-chosen keywords. A keyword can be one or more words, or even the first few characters of the first word. For example, while looking for the film Star Wars, a user might only need to type in “star”; but if the search engine bases its search algorithm on popularity (that is, it favors popular films in the first results), then “st” should be sufficient enough to find the blockbuster Star Wars.
If you want to find a movie, you most likely need only a few attributes, such as title, description, cast, crew, year, and a few others. If you want to perform a more general research, you’ll add attributes like themes, dialogues, cross-references, and additional background information. However, the list of attributes can get quite large. For example, cars have 1000s of attributes – material used, the name, type, and year of each part, owner history, factories and repair history, speed, and so on.
What all objects have in common is the notion of keywords. Keywords are the words the owners of the content use as they build an object’s attributes, such as title, brand, author, year, and price. Or from another point of view: keywords are the “words” that a search engine uses to match the words in an index with the query of the searcher.
As we’ve outlined above, a search engine identifies documents (books, web pages, products) that match a user’s query (keywords). To do this, it cannot scan every document. So it uses an index, either an exhaustive index of every word, or an attribute-based index with a subset of the most important descriptions.
An index is created before a user searches. It is a pre-scan of the underlying content. It’s also in a separate part of the server. For example, in a content-based scenario, the search engine pre-scans every document and saves all the unique words in an index. Many search engines structure their index in an “inverted index”, as we describe in the last (fun) section.
Search indexes come with an order. For online searches like Google and Amazon, search results are usually ordered on the “best” not “accurate” matches.
In those contexts, it’s not only about accurate results. If a user types in “brad” and Brad Pitt comes up, that doesn’t mean it’s accurate. Other results will include Brad Davis or the Brady theater. They are all relevant in different ways, but none of them can be considered “accurate”. One user who types in “brad” might choose to go to Brad Pitt’s Wikipedia page, another might go to Brad Pitt’s IMDB page. Accuracy doesn’t really capture the meaning of these choices.
It’s all about how right the result feels to a given user, or how the result matches the intent of the searcher. To return to the compass metaphor: a compass helps us navigate by combining accuracy and relevance: a compass gives us accuracy in terms of north and south; but it also gives us relevance by pointing in the general direction of our destination and helping us match our intentions with our knowledge of the physical world to reach our destination. On the other hand, we expect a GPS system to be accurate not relevant.
Consider the bank employee who looks up your records and finds out that you owe the bank some money. The bank employee’s search results better be completely accurate. Likewise, when store employees or customers look for precise products, they are not interested in relevance: they rely on an accurate, exact product identifiers.
This is not to say that searching by relevance does not contain an aspect of “accuracy”. For example, if someone types in “ball point pen”, the accuracy is to find all products that have an attribute with the words “ball point pen”. However, accuracy gives way to relevance: the relevance is which ball point pen to show first.
A more technical way to explain the difference is to consider the difference between a database and a search index. Database-like indexing (the bank example) is centered around accuracy – ensuring that exact matches are properly sorted and exhaustive. A search-index-based search like Google is more flexible, where the textual matching is a mix between textual accuracy and relevance (optimizing your content for what we call SEO (search engine optimization)).
Similar to Google, the site search we see on Amazon and Netflix, and on websites where search is provided by Algolia, rely on a combination of structured sets of attributes and a ranking system that bases relevance on popularity, trends, likes, and a business’s product-promotional needs.
Okay, so let’s open up the hood. A search engine index is saved in a structure that enables fast retrieval. We call this structure an inverted index. One thing to note, an index is saved separately on the server, in a different location than the data.
While there are many types of inverted indexes, with many nuances, the following diagram sums up the idea:
As you can see, with an inverted index, the search engine inverses the logic. So, instead of reading (scanning) a document looking for words, it inverts that process and uses the words to find the documents. Here’s an example of an inverted index:
… and so on. Let’s say there are 10,000s unique words in 999 documents.
In the above diagram, the search engine’s logic to search an inverted index followed this process to find “aardvark”:
That’s how every word in a set of documents is stored in an index. It gets more complicated for non-prefix, middle of the word queries, but you get the idea.
And that’s all … Well, there are a lot more details. If you’re interesting in more, check out Algolia CTO’s article on the inside story of indexing.
Powered by Algolia Recommend