A challenge that documentation writers face is how to encourage developer readers to stay in the documentation and not reach out to support until absolutely necessary. To meet this challenge, writers of course focus on the text — its clarity, coherence, and exhaustivity. They also focus on form, devising ways to guide developers to the relevant text — structuring the information, highlighting key words, hyperlinking, creating menus and submenus, thinking up cool titles. Algolia adds search to that effort.
Developers come to documentation seeking answers, hoping to find what they are looking for. They browse, read, and browse a bit more. Often they find what they need, but not always. Some will eventually contact support for more personalized guidance, which may send them back to the documentation, but this time to the exact paragraph or code sample they were looking for.
Algolia’s documentation is about search — to wit: how to use our search API. So, we thought: if our users can’t always find what they need using our own search bar — and worse, to learn later that what they were looking for was actually present in the documentation— what sort of message were we sending about our API?
So we’ve faced this challenge head-on with an example of Algolia’s powerful search engine — by expanding our current search bar into a fully-integrated search panel.
The new search panel is designed to be prominent and conversational, so that whenever our developers ask themselves What is or How to or Why, they simply type the remaining part of their question in the search bar, and our new panel lights up with the answers they are looking for.
Overall, our UI/UX model was Google. We adopted a Google-like feel, but used our own search engine + the knowledge of our own documentation to drive the whole user journey from question(s) to answer(s).
We also believe that search is a bridge between customer support and documentation. That’s why we included support discussions from our public forum in the new search panel. Now, when you search our documentation you’ll also be looking into our support history. That way, you get a side-by-side view of all relevant information about our API — relevant texts, code snippets, and all support tickets.
This time our model was Google + Stack Overflow, that well-known dynamic duo that has saved every developer from the great unknown. Stack Overflow, and more generally community-driven support, have become essential to the developer experience. By integrating our own developer community into our documentation, we will be giving our developers that same standard — and maybe even more, given that we know our own support data and can therefore fine-tune the search results.
Finally, taking this Google/Stack Overflow model a bit further, we decided to display code samples in the search results. Many developers come to our docs with a very specific question in mind; for them, finding a well-written line of code is often the best, most direct answer. So we added a toggle button to switch between text and code, allowing developers to search only for code.
With these features in place — a prominent search panel, integrated support, and code searching — we hope to extend the trust with our readers, so that they keep coming to our documentation expecting a useful experience.
We are also backing up our efforts with analytics: real metrics that will help us follow developers from query to query, page to page, and even from support to documentation. That kind of feedback loop will tell us how we can shorten the reading process and make it more pleasant, and it can also indicate how we can encourage our doc readers to use more advanced features, to push our API to its limits, which benefits everybody.
And we won’t stop at analytics. Because the challenges — to write clear, coherent, exhaustive, and easy-to-find information — will never go away, we will need to keep improving by focusing on different kinds of search strategies that work particularly well for natural language documentation.
…or more specifically — what strategies did we use to ensure that our readers find what they are looking for?
In a nutshell: a successful document-based search relies in large part on how you organize your content. Global sections, individual pages, headers / sub-headers, and paragraphs — these are only some of the textual elements that, when done consistently and ordered logically, matter a lot. In our case, with a well-thought and cohesive collection of texts, Algolia’s speed and relevance work out of the box.
Another focus is on the query itself. The search engine can, for example, behave differently depending on the specificity of the query: for simple, general queries (like “index” or “install”), the content can stay high-level. For longer or more precise queries (like method names, or longer sentences), we can switch the results into more low-level API references and concepts.
Let’s look at what Algolia does best — searching structured data. Here is an example of a T-shirt inventory. If the user is looking for a “slim red T-shirt with nothing on it”, you can help them find it by filtering:
Type: T-shirt
Color: red
Design: blank
Type: slim
If the user types in “T-shirt”, they get the whole inventory (1M records). If they add “red”, you divide the 1M t-shirts by 5 (let’s say there are 5 colors). If you add “slim”, you divide by 3 (there are 3 types: slim, wide, and stretch). If you start adding other criteria – like “midriff”, “sleeveless”, “multi-colored”, and so on, you could conceivably reduce the relevant stock to 25 t-shirts. Not bad, from 1M to 25! And a good UI would make this process as easy as possible for the user.
All this works as described when the content in which you are looking contains very clearly defined items. The discrete clarity of commercial products is what lies behind the success of structured data searches.
But not everything is so discrete. English language has an unlimited number of ambiguities, so creating categories for natural language is not a scaleable solution.
Let’s now take a look at two queries which make for a difficult search structuring as described above.
Let’s switch subjects to better illustrate the point. Let’s say a lawyer types “out of order” in a legal database that contains court cases, laws, and legal journals. For this query, there are at least 4 relevant categories of documents, with each category containing 1000s of documents:
The lawyer clearly needs to signal to the search engine which of these categories is relevant.
It would be the same if a developer were to come to Algolia’s documentation and search for “indexing filters” and find two categories of documents: :
and four formats:
The developer will want to have control over both the subject and format of the documents retrieved. I’ll use the term “researchers” for our confused lawyers and developers above.
Let’s go back to the T-shirt example to see if that can help here. That example was about one item: the consumer is searching for one thing, and the quicker they find it the better.
The other extreme are researchers: researchers are often not looking for one thing. Their query is to think about a subject, to get a better understanding and to construct and support an argument. They have to be patient. If they are searching a site with 1M documents, they are ready to scan through 1000s of results (in say one or two hours, or days, or longer), and to read 100s of documents. We are clearly not talking about consumers.
Developers fall somewhere between these extremes. Sometimes they know more or less what to look for and so are searching for one thing — for example, a method or a specific setting. Other times they don’t really know what they are looking for: they might be onboarding, or trying to solve a difficult problem, or looking to enhance their current solution. In this case, they are more researcher than searcher.
But even here, we don’t want to waste a researcher’s time with irrelevant results. And we surely don’t want them to fail by not presenting them with important results (this is the difficult balance of precision and recall).
Essentially, we want researchers to have the same quality results — and the same confidence in Algolia — that our consumer clients have.
And so the challenge is clear. How do we structure our “unstructured” documentation to come up with consumer-grade results?
Algolia’s power lies in structuring the data before searching. To put this into action, we focused on four key areas:
Our indexing and relevance strategies follow our DocSearch methodology, which has been well documented by our CTO in a previous blog post on The Laravel Example. There he describes:
A recent feature not mentioned in the post is our extensive use of Query Rules to handle special use cases like specific products or coding language queries.
There is, of course, the matter of documentation tools. We have written about that in a separate post.
Searching through thousands of documents is not an exact science. There are many pitfalls, and though we’ve solved many of them, it’s hard not to wonder: what happens when there are 1,000,000+ documents? Here are some interesting features not yet implemented.
Algolia offers complete filtering out of the box, but we rely on our users to define the most effective filters for their information. One way to do that is to use word clouds. Word clouds, in this context, are a set of filters that act as one global filter. For document-intensive searching, word clouds can be quite powerful.
For example, we can help resolve the above lawyer-researcher’s “out of order” ambiguity by using word-cloud filtering:
As you can see, the four word clouds above match the four distinct areas of law mentioned in the “out of order” example. Normally, a filter is one word: by presenting a set of related keywords within a single frame/word cloud, we offer the user more information to help choose the best filter. And by making these word clouds clickable (as seen below), the user can think-by-clicking, to test which set of words most closely matches his or her train of thought.
There are many ways to build word clouds, one of which is to scan each document using specialized dictionaries, to pick out keywords that make the document unique — and to do this before indexing them. For the example above, you would use different specialized legal dictionaries. For our API docs, we would use our own API reference pages as dictionaries for each new document added to our documentation.
Some documents are so similar in terms of relevance that it is impossible to know which should be presented first. At this point, the engine needs help. With structured data, such as shopping items, this is achieved through custom ranking, using metrics like “most popular” or “most sold”. However, using metrics is not always relevant for texts. For example, we can use “most cited” or “most read”, but these metrics are often irrelevant to a researcher.
So, why not create front end tools that help researchers — documentation readers themselves — choose between different ways to break ties?
Below is one such tool, which implements thematic frequency — a shortcut term to refer to the classification of documents by theme. Each document can be placed in one or more themes based on how close (or far) its content is from the theme. The themes are represented by word clouds. Documents can be scored using the thematic word clouds by matching the document’s content with the keywords contained in the word clouds. Later, filtering can use that scoring to both find and order the documents.
For example, here’s a subset of results for the theme “server-side indexing and filtering”, in the order returned by Algolia:
The UI can offer the ability to switch between rankings:
By choosing the last option – thematic frequency – the researcher could reorder the results from (1, 5, 20) to (20, 1, 5), because record 20 contains the largest number of thematic keywords. In other words, document 20 goes to the top because it is more consistent with the theme of “server-side indexing and filtering” than documents 1 and 5.
These bonus strategies, as well as many others, will keep us – and hopefully our readers – confidently within the conversational search powered by Algolia.
We look forward to your feedback on the effort we’ve put in so far, and on future ideas: @algolia, @codeharmonics.
Peter Villani
Sr. Tech & Business WriterPowered by Algolia AI Recommendations
Sarah Dayan
Principal Software EngineerMaxime Locqueville
DX Engineering ManagerJulien Lemoine
Co-founder & former CTO at Algolia