Co-founder & former CTO at Algolia
Sorry, there is no results for this query
Search has come a long way.
We used to Press Enter in blind faith, hoping we’d find our treasure. If not, we’d add or remove a word, or rephrase the query entirely, and click Enter again. And again. And don’t talk about typos – for those of us who were not native speakers or with bad typing or spelling skills, we’d just give up.
Well, search changed a few years back when SaaS (Search-as-a-Service) companies hit the market and showed us how search ought to be: fast and relevant; as you type instant results; vibrant UI/UX; facets and filtering.
Gone was the Enter key.
And SaaS technologies continued to advance, adding more product detail, customer reviews and ratings, multiple photos, recommendations, and merchandising – with even more blazing speeds and relevance, framed within a far easier navigation.
Search has now become essential to internet shopping, and profitable for competitive online businesses.
But again, search has come a long way, reaching a new milestone: question answering and knowledge. Take a look at this:
With knowledge, we no longer have to leave the search screen. We can stay and ask more questions, receive multiple answers, click on related questions, and read a summary of the topic with links to go further.
All with a single query. Or with voice, or images. (Not yet thought.)
And this is only the beginning. GPT is right around the corner. Search is no longer the same.
Business is no longer the same. All companies, large or small, in any field – ecommerce, media, travel — can implement question answering and knowledge panels as a part of their knowledge management and digital platforms.
Why has presenting knowledge alongside search results changed everything?
In the good old days, alchemists worked in secrecy, greedily concealing their knowledge from others. What they failed to understand was that their knowledge (right or wrong) was more golden than the gold they failed to create.
They didn’t appreciate that sharing knowledge advances human understanding and performance.
Knowledge is an asset. Companies that understand the monetary and competitive value of knowledge learn to free up their data and personalize it for their employees, business partners, and customers.
Hence, the importance of knowledge management.
The starting point for any successful and truly useful knowledge management system is to understand what people need while using such a system. And there’s no better place to start than at the beginning, where the knowledge is first obtained: the search interface.
Consider three search use cases that knowledge management can address:
*Note that semantic search is missing from this list. That’s because semantic search, like any AI search, such as NLP, NLU, LMs, and GPT, are not use cases but search technologies that can be used in all three of the above use cases. We will discuss semantic search as we go along.
Let’s combine the intentions behind all these use case into one phrase:
Users need access to all available, necessary, and useful information, and in the right format, to satisfy their immediate intentions.
Quite dense, this phrase. How’s this:
People need a search interface that provides them with the information they need with only a few search terms.
When we think of such a “system”, we often jump to the “information” part and ask: How can we combine our many separate data sources and different formats and processes? This takes us into system architecture, data science, and application programming. And rightly so. But it skips the essential point – the why – Why do we need knowledge management?
By starting with the end-user experience – access to information by searching – we place the focus on finding and using knowledge.
In other words, it’s all about being able to find the best and most useful information. And digital finding starts with a search bar or a voice assistant — or with what we call federated search.
Federated search, or multi-dimensional search, is the display of distinct pieces of information carefully placed on a single search interface. It gives the user a variety of information and options with every query. Take a look at the following schema:
The general idea is that a single query displays a variety of information, such as standard results, hyperlinks, facets and filters, menus, merchandising and recommendations, and a larger “knowledge panel” that provides a fairly complete answer to a question or a summary of some related topic.
We can see federated search in action on Birchbox’s search page. Note the product listing, facets, categories, and an “Encyclopedia” of FAQs and articles on the right:
The essence of a sort of knowledge management ecommerce is that a variety of relevant information can be found so easily and quickly.
In the remaining two sections, we’ll look closely at:
As a reminder, there are three use cases:
Classic ecommerce search returns a list of products with short descriptions, images, price, etc. Using that information, users can choose to view or buy an item, or change their search. Most ecommerce websites provide faceting and filtering and recommendations of related items. Amazon provides the model for most ecommerce businesses.
This is what classic search looks like.
But you can also add knowledge to the classic search experience. The right side was empty, so we added some additional information:
We’ve added an informative snapshot of the first result. It contains a summary of the phone’s details, ratings, shipping information, options, and more. You can also click on the item’s options or see the reviews.
And as you scroll down the search results, you can see previews of each item:
You’ll want to design the experience well, because too much knowledge can start to clutter up your screen. For example, adding item-based accessories and recommendations is important; but it needs to be designed well:
What about creating a more natural, semantic search?
Google has changed our expectations with its knowledge panel, and its ability to answer a question and propose related questions:
There’s a lot going on here: Google answers the question (“how large is the Antarctica?”), and it proposes other questions. It also compares the size of Antarctica to other countries and provides a summary of Antarctica with images and links to get more information about size, population, and history.
There’s really no reason to leave this screen, unless you want more information.
And Google shifts its design based on the topic. Artists get a different treatment. For example, a question (“who is jeff beck?”) returns photos, easy access to his art, and some biographical information:
Oddly, Google doesn’t provide the same knowledge for news, where you’d think it would be even more important to get at least the lead paragraph:
An information-driven website like a news journal, blog, or software documentation can add a snapshot of the content as the user scrolls down the results:
Finally, documentation websites often use the same technique – displaying a snapshot of the page. This saves the developer time by not having to sift through multiple pages until they find their answer:
In sum, knowledge-based search is all about presenting information in multiple ways, not just providing links.
Now let’s dig a bit into how the search engine makes this possible.
For decades there has been a heated debate about the fact that a functioning knowledge management system is not something that can be installed in an intranet like any software system, and that knowledge cannot be stored in documents or databases. With the rise of Knowledge Graphs (KGs), many knowledge management practitioners have asked themselves the question whether KGs are just another database, or whether this is ultimately the missing link between the knowledge level and the information and data levels in the DIKW pyramid as depicted here.
— Andreas Blumauer, Founder & CEO at Semantic Web Company
We can add that Wisdom, in the context of enterprise and ecommerce knowledge management, is related to such intentions as making informed decisions, performing well, creating better marketing campaigns, and increasing customer satisfaction and online revenue.
Data needs context to give it meaning. Otherwise, it’s simply raw information, such as strings of characters (“A”, “a”, “B”, ..), numbers, and binary 0s and 1s representing true and false. A customer’s name is a string of data. It only becomes a meaningful string when someone searches for a customer’s name or wants to view their sales activity.
Back-end databases and applications provide the data; front-end search interfaces display that data as information and knowledge.
To get data from the back end to the front end, engineers and data scientists need to build a data pipeline that feeds the middle layer. The middle layer provides API access to the front end, enabling the front-end engineers to build a federated search interface.
We’ll look at two middle-layer datasets:
Most online companies use a cloud-based search index for their search functionality. A search index is made up of a subset of relevant information, such as searchable keywords and metadata (title, brand, color, price, etc.), and longer free-text descriptions. It also contains information used for displaying, sorting, and filtering the information on the search interface.
When dealing with multiple data sources (e.g., product catalog, customer data), the engineers need to create a “data pipeline” that merges the various data sources into one search index.
Here’s a picture of the flow between the back end applications to a single search index:
Knowledge management, which includes question answering, comes from a second middle layer of data:
And there you have it – the perfect example of transforming company data into relevant user knowledge. Of course, the devil is in the details, and there are multiple ways to architect this, but this should give a general idea of what’s involved.
There are many ways to store knowledge. It can be stored in one search index, or in formats such as vectors and knowledge graphs. We’ll discuss the latter two.
For many years search engines have predominantly relied on keywords, much like the indexes you find in the back of books. Unless a query matches a keyword in your index, the search engine can come up empty-handed. While the concept of “matching” has traditionally powered search engines, a major shift from “matching” to “understanding” is currently underway. This is being driven by AI, which is used to represent text mathematically such that it can be conceptually understood by machines. Concepts are taking over from keywords and it’s great news for everyone.
– Hamish Ogilvy, VP, Artificial Intelligence at Algolia
When machines “read” a lot (a very lot) of data, they self-teach themselves to understand, or give meaning to, the content. With automated self-learning, they can accomplish the following:
Most ML technologies (like neural networks) create some form of a vectorized dataset. Vectorization is the process of converting words into vectors (numbers) which allows their meaning to be encoded and processed mathematically. You can think of vectors as groups of numbers that represent words. In practice, vectors are used for automating synonyms, clustering documents, detecting specific meanings and intents in queries, and ranking results. Vectorization is very versatile, and other objects — like entire documents, images, video, audio, and more — can be vectorized, too.
We can visualize vectors using a simple 3-dimensional diagram:
Image via Medium showing vector space dimensions.
You and I can understand the meaning and relationship of terms such as “king,” “queen,” “ruler,” “monarchy,” and “royalty.” With vectors, computers can make sense of these terms by clustering them together in n-dimensional space. In the 3-dimensional examples above, each term can be located with coordinates (x, y, z), and similarity can be calculated using distance and angles.
Machine learning models can then be applied to understand that words which are close together in vector space — like “king” and “queen” — are related, and words that are even closer — “queen” and “ruler” — may be synonymous.
Vectors can also be added, subtracted, or multiplied to find meaning and build relationships. One of the most popular examples is king – man + woman = queen. Machines might use this kind of relationship to determine gender or understand gender relationship. Search engines could use this capability to determine the largest mountain ranges in an area, find “the best” vacation itinerary, or identify diet cola alternatives. Those are just three examples, but there are thousands more!
By finding word relationships and building factual and conceptual inferences, machine learning enables a system to synthesize a company’s unique knowledge. Google has invested millions of engineering hours for more than a decade to design its deep learning BERT language models and search infrastructure for internet search.
The same can be done in ecommerce, media, and other online services.
In the history of search, there are many different approaches to search, for example, classic keyword search, which matches individual words or phrases based on textual matching or synonyms. Ontologies and knowledge graphs took that a step forward by adding topic matching, where items and documents that fall within the same topic or contain the same entities (not necessarily the same words) are also considered relevant to a query.
– Julien Lemoine, Co-founder & former CTO at Algolia
Let’s start with the “ontologies and hierarchies”. A knowledge graph contains “nodes” such as concepts, subject topics, hierarchies, and other such ways to classify information. It also contains “edges” that show relationships between the nodes.
Usually, the nodes are nouns and edges are verbs. We’ll use a simple example: “cows” are “animals”, “animals” are “living things”, and “cows” eat “herbs”:
You can see that living things are plants that eat. Even more importantly, this schema enables us to make inferences, such as what kinds of food living things need to eat.
Once categories are in place, a company can distribute its real objects (products, finances, customers, and other facts) into the graph. For example, a specific “living thing” (Daisy) can be understood by its classifications and relationships to other facts and objects.
We can say that, as a living thing, Daisy needs to eat, and therefore, we can infer that some of its behaviors relate to eating. Thus, a search for “cow food” might not only show different brands of cow food, but may show knowledge about how cows find food, nutritional information, and more.
On a high level, that’s how a knowledge graph represents knowledge, with facts (nodes) and factual relationships (edges) and chains of inferences that provide a summary or tell a full story.
It’s important to understand some of the challenges with knowledge graphs:
Essentially, for a knowledge graph to be useful, and not incomplete or inaccurate, it needs a lot of data and ongoing maintenance. But one way to contain that is to focus on smaller goals. For example, an ecommerce company can limit the breadth of its content by creating a small knowledge graph for its online services, including knowledge about products, sales, marketing, and customers. On the other hand, the same ecommerce company can create a second larger knowledge graph for its employees. But the latter might be best served by semantics and vectorization.
Worth mentioning: Semantics and knowledge graphs can work together. A knowledge graph can feed a vectorized classification, so that the middle layer can combine the classic search index with a robust, automated AI index based on powerful language models like LLMs and GPT.
Multiple results, deep knowledge, answers to questions – all on one search interface. Chat with us if you want to know what’s happening now, what’s coming next, and how to build your own knowledgeable federated search experience.
Sr. SEO Web Digital Marketing Manager