Search by Algolia
Haystack EU 2023: Learnings and reflections from our team
ai

Haystack EU 2023: Learnings and reflections from our team

If you have built search experiences, you know creating a great search experience is a never ending process: the data ...

Paul-Louis Nech

Senior ML Engineer

What is k-means clustering? An introduction
product

What is k-means clustering? An introduction

Just as with a school kid who’s left unsupervised when their teacher steps outside to deal with a distraction ...

Catherine Dee

Search and Discovery writer

Feature Spotlight: Synonyms
product

Feature Spotlight: Synonyms

Back in May 2014, we added support for synonyms inside Algolia. We took our time to really nail the details ...

Jaden Baptista

Technical Writer

Feature Spotlight: Query Rules
product

Feature Spotlight: Query Rules

You’re running an ecommerce site for an electronics retailer, and you’re seeing in your analytics that users keep ...

Jaden Baptista

Technical Writer

An introduction to transformer models in neural networks and machine learning
ai

An introduction to transformer models in neural networks and machine learning

What do OpenAI and DeepMind have in common? Give up? These innovative organizations both utilize technology known as transformer models ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What’s the secret of online merchandise management? Giving store merchandisers the right tools
e-commerce

What’s the secret of online merchandise management? Giving store merchandisers the right tools

As a successful in-store boutique manager in 1994, you might have had your merchandisers adorn your street-facing storefront ...

Catherine Dee

Search and Discovery writer

New features and capabilities in Algolia InstantSearch
engineering

New features and capabilities in Algolia InstantSearch

At Algolia, our business is more than search and discovery, it’s the continuous improvement of site search. If you ...

Haroen Viaene

JavaScript Library Developer

Feature Spotlight: Analytics
product

Feature Spotlight: Analytics

Analytics brings math and data into the otherwise very subjective world of ecommerce. It helps companies quantify how well their ...

Jaden Baptista

Technical Writer

What is clustering?
ai

What is clustering?

Amid all the momentous developments in the generative AI data space, are you a data scientist struggling to make sense ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is a vector database?
product

What is a vector database?

Fashion ideas for guest aunt informal summer wedding Funny movie to get my bored high-schoolers off their addictive gaming ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Unlock the power of image-based recommendation with Algolia’s LookingSimilar
engineering

Unlock the power of image-based recommendation with Algolia’s LookingSimilar

Imagine you're visiting an online art gallery and a specific painting catches your eye. You'd like to find ...

Raed Chammam

Senior Software Engineer

Empowering Change: Algolia's Global Giving Days Impact Report
algolia

Empowering Change: Algolia's Global Giving Days Impact Report

At Algolia, our commitment to making a positive impact extends far beyond the digital landscape. We believe in the power ...

Amy Ciba

Senior Manager, People Success

Retail personalization: Give your ecommerce customers the tailored shopping experiences they expect and deserve
e-commerce

Retail personalization: Give your ecommerce customers the tailored shopping experiences they expect and deserve

In today’s post-pandemic-yet-still-super-competitive retail landscape, gaining, keeping, and converting ecommerce customers is no easy ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Algolia x eTail | A busy few days in Boston
algolia

Algolia x eTail | A busy few days in Boston

There are few atmospheres as unique as that of a conference exhibit hall: the air always filled with an indescribable ...

Marissa Wharton

Marketing Content Manager

What are vectors and how do they apply to machine learning?
ai

What are vectors and how do they apply to machine learning?

To consider the question of what vectors are, it helps to be a mathematician, or at least someone who’s ...

Catherine Dee

Search and Discovery writer

Why imports are important in JS
engineering

Why imports are important in JS

My first foray into programming was writing Python on a Raspberry Pi to flicker some LED lights — it wasn’t ...

Jaden Baptista

Technical Writer

What is ecommerce? The complete guide
e-commerce

What is ecommerce? The complete guide

How well do you know the world of modern ecommerce?  With retail ecommerce sales having exceeded $5.7 trillion worldwide ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Data is king: The role of data capture and integrity in embracing AI
ai

Data is king: The role of data capture and integrity in embracing AI

In a world of artificial intelligence (AI), data serves as the foundation for machine learning (ML) models to identify trends ...

Alexandra Anghel

Director of AI Engineering

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

A challenge that documentation writers face is how to encourage developer readers to stay in the documentation and not reach out to support until absolutely necessary. To meet this challenge, writers of course focus on the text  its clarity, coherence, and exhaustivity. They also focus on form, devising ways to guide developers to the relevant text structuring the information, highlighting key words, hyperlinking, creating menus and submenus, thinking up cool titles. Algolia adds search to that effort.

Algolia’s new integrated search panel

From browse to search

Developers come to documentation seeking answers, hoping to find what they are looking for. They browse, read, and browse a bit more. Often they find what they need, but not always. Some will eventually contact support for more personalized guidance, which may send them back to the documentation, but this time to the exact paragraph or code sample they were looking for.

Algolia’s documentation is about search — to wit: how to use our search API. So, we thought: if our users can’t always find what they need using our own search bar — and worse, to learn later that what they were looking for was actually present in the documentation— what sort of message were we sending about our API?

So we’ve faced this challenge head-on with an example of Algolia’s powerful search engine — by expanding our current search bar into a fully-integrated search panel.

documentation search panel

The new search panel is designed to be prominent and conversational, so that whenever our developers ask themselves What is or How to or Why, they simply type the remaining part of their question in the search bar, and our new panel lights up with the answers they are looking for.

Adopting best practices

Overall, our UI/UX model was Google. We adopted a Google-like feel, but used our own search engine + the knowledge of our own documentation to drive the whole user journey from question(s) to answer(s).

We also believe that search is a bridge between customer support and documentation. That’s why we included support discussions from our public forum in the new search panel. Now, when you search our documentation you’ll also be looking into our support history. That way, you get a side-by-side view of all relevant information about our API — relevant texts, code snippets, and all support tickets.

This time our model was Google + Stack Overflow, that well-known dynamic duo that has saved every developer from the great unknown. Stack Overflow, and more generally community-driven support, have become essential to the developer experience. By integrating our own developer community into our documentation, we will be giving our developers that same standard — and maybe even more, given that we know our own support data and can therefore fine-tune the search results.

Finally, taking this Google/Stack Overflow model a bit further, we decided to display code samples in the search results. Many developers come to our docs with a very specific question in mind; for them, finding a well-written line of code is often the best, most direct answer. So we added a toggle button to switch between text and code, allowing developers to search only for code.

With these features in place — a prominent search panel, integrated support, and code searching — we hope to extend the trust with our readers, so that they keep coming to our documentation expecting a useful experience.

We are also backing up our efforts with analytics: real metrics that will help us follow developers from query to query, page to page, and even from support to documentation. That kind of feedback loop will tell us how we can shorten the reading process and make it more pleasant, and it can also indicate how we can encourage our doc readers to use more advanced features, to push our API to its limits, which benefits everybody.

And we won’t stop at analytics. Because the challenges — to write clear, coherent, exhaustive, and easy-to-find information — will never go away, we will need to keep improving by focusing on different kinds of search strategies that work particularly well for natural language documentation.

Search in more detail

…or more specifically — what strategies did we use to ensure that our readers find what they are looking for?

In a nutshell: a successful document-based search relies in large part on how you organize your content. Global sections, individual pages, headers / sub-headers, and paragraphs — these are only some of the textual elements that, when done consistently and ordered logically, matter a lot. In our case, with a well-thought and cohesive collection of texts, Algolia’s speed and relevance work out of the box.

Another focus is on the query itself. The search engine can, for example, behave differently depending on the specificity of the query: for simple, general queries (like “index” or “install”), the content can stay high-level. For longer or more precise queries (like method names, or longer sentences), we can switch the results into more low-level API references and concepts.

Searching structured data

Let’s look at what Algolia does best — searching structured data. Here is an example of a T-shirt inventory. If the user is looking for a “slim red T-shirt with nothing on it”, you can help them find it by filtering:

Type: T-shirt
Color: red
Design: blank
Type: slim

If the user types in “T-shirt”, they get the whole inventory (1M records). If they add “red”, you divide the 1M t-shirts by 5 (let’s say there are 5 colors). If you add “slim”, you divide by 3 (there are 3 types: slim, wide, and stretch). If you start adding other criteria – like “midriff”, “sleeveless”, “multi-colored”, and so on, you could conceivably reduce the relevant stock to 25 t-shirts. Not bad, from 1M to 25! And a good UI would make this process as easy as possible for the user.

All this works as described when the content in which you are looking contains very clearly defined items. The discrete clarity of commercial products is what lies behind the success of structured data searches.

But not everything is so discrete. English language has an unlimited number of ambiguities, so creating categories for natural language is not a scaleable solution.

Unstructured text — from search to research?

Let’s now take a look at two queries which make for a difficult search structuring as described above.

Example 1— legal text

Let’s switch subjects to better illustrate the point. Let’s say a lawyer types “out of order” in a legal database that contains court cases, laws, and legal journals. For this query, there are at least 4 relevant categories of documents, with each category containing 1000s of documents:

  • A database can be “out of order”, causing system failure (computer law)
  • An elevator is “out of order”, causing monetary loss (commercial law) or personal injury or death (torts law)
  • A lawyer is “out of order” in a courtroom (procedural law)
  • A factory produces widgets “out of order”, breaking contractual terms (contract law)

The lawyer clearly needs to signal to the search engine which of these categories is relevant.

Example 2 — API documentation

It would be the same if a developer were to come to Algolia’s documentation and search for “indexing filters” and find two categories of documents: :

  • the back end (creating filters in your data)
  • the front end (filtering at query time)

and four formats:

  • concepts, tutorials, code snippets, API reference

The developer will want to have control over both the subject and format of the documents retrieved. I’ll use the term “researchers” for our confused lawyers and developers above.

As-you-think

Let’s go back to the T-shirt example to see if that can help here. That example was about one item: the consumer is searching for one thing, and the quicker they find it the better.

The other extreme are researchers: researchers are often not looking for one thing. Their query is to think about a subject, to get a better understanding and to construct and support an argument. They have to be patient. If they are searching a site with 1M documents, they are ready to scan through 1000s of results (in say one or two hours, or days, or longer), and to read 100s of documents. We are clearly not talking about consumers.

Developers fall somewhere between these extremes. Sometimes they know more or less what to look for and so are searching for one thing — for example, a method or a specific setting. Other times they don’t really know what they are looking for: they might be onboarding, or trying to solve a difficult problem, or looking to enhance their current solution. In this case, they are more researcher than searcher.

But even here, we don’t want to waste a researcher’s time with irrelevant results. And we surely don’t want them to fail by not presenting them with important results (this is the difficult balance of precision and recall).

Essentially, we want researchers to have the same quality results — and the same confidence in Algolia — that our consumer clients have. 

And so the challenge is clear. How do we structure our “unstructured” documentation to come up with consumer-grade results?

Search strategies for documents

Behind our new search panel — what we’ve done

Algolia’s power lies in structuring the data before searching. To put this into action, we focused on four key areas:

  • Organizing content — the way that our documentation is organized (within the page as well as the combination of all the pages) is probably the most important step
  • Indexing — structuring the text for search and relevance
  • Relevance — testing out numerous queries and engine configurations, to ensure a consistently good relevance
  • UI/UX — the developer experience: how to encourage our readers to use and to keep using our integrated search panel. (Although this is of equal importance, we do not describe how to implement the InstantSearch UI)

Our indexing and relevance strategies follow our DocSearch methodology, which has been well documented by our CTO in a previous blog post on The Laravel Example. There he describes:

  • How to structure your texts with sections, headers / subheaders, and small paragraphs
  • How to order the overall content so that that some information is more important than other
  • How to tweak our tie-breaking algorithm using customer ranking (again, relevance)
  • How to configure Algolia with filters, facets, synonyms, and many other settings.

A recent feature not mentioned in the post is our extensive use of Query Rules to handle special use cases like specific products or coding language queries.

There is, of course, the matter of documentation tools. We have written about that in a separate post.

Exploring what’s next

Searching through thousands of documents is not an exact science. There are many pitfalls, and though we’ve solved many of them, it’s hard not to wonder: what happens when there are 1,000,000+ documents? Here are some interesting features not yet implemented.

A word cloud filter

Algolia offers complete filtering out of the box, but we rely on our users to define the most effective filters for their information. One way to do that is to use word clouds. Word clouds, in this context, are a set of filters that act as one global filter. For document-intensive searching, word clouds can be quite powerful.

For example, we can help resolve the above lawyer-researcher’s “out of order” ambiguity by using word-cloud filtering:

word cloud filtering

As you can see, the four word clouds above match the four distinct areas of law mentioned in the “out of order” example. Normally, a filter is one word: by presenting a set of related keywords within a single frame/word cloud, we offer the user more information to help choose the best filter. And by making these word clouds clickable (as seen below), the user can think-by-clicking, to test which set of words most closely matches his or her train of thought.

There are many ways to build word clouds, one of which is to scan each document using specialized dictionaries, to pick out keywords that make the document unique — and to do this before indexing them. For the example above, you would use different specialized legal dictionaries. For our API docs, we would use our own API reference pages as dictionaries for each new document added to our documentation.

Thematic frequency

Some documents are so similar in terms of relevance that it is impossible to know which should be presented first. At this point, the engine needs help. With structured data, such as shopping items, this is achieved through custom ranking, using metrics like “most popular” or “most sold”. However, using metrics is not always relevant for texts. For example, we can use “most cited” or “most read”, but these metrics are often irrelevant to a researcher.

So, why not create front end tools that help researchers — documentation readers themselves — choose between different ways to break ties?

Below is one such tool, which implements thematic frequency a shortcut term to refer to the classification of documents by theme. Each document can be placed in one or more themes based on how close (or far) its content is from the theme. The themes are represented by word clouds. Documents can be scored using the thematic word clouds by matching the document’s content with the keywords contained in the word clouds. Later, filtering can use that scoring to both find and order the documents.

For example, here’s a subset of results for the theme “server-side indexing and filtering”, in the order returned by Algolia:

a subset of results for the theme “server-side indexing and filtering”

The UI can offer the ability to switch between rankings:

  • Keep the ranking returned by Algolia’s tie-breaking algorithm.
  • Adjust the ranking with Algolia’s custom ranking feature, ordering the list by “most read” or “most cited”.
  • Add thematic frequency on top of Algolia’s ranking algorithm, reordering the results according to the strength of the document’s relationship to the active theme.

By choosing the last option – thematic frequency – the researcher could reorder the results from (1, 5, 20) to (20, 1, 5), because record 20 contains the largest number of thematic keywords. In other words, document 20 goes to the top because it is more consistent with the theme of “server-side indexing and filtering” than documents 1 and 5.

These bonus strategies, as well as many others, will keep us – and hopefully our readers – confidently within the conversational search powered by Algolia.

We look forward to your feedback on the effort we’ve put in so far, and on future ideas: @algolia, @codeharmonics.

About the author
Peter Villani

Sr. Tech & Business Writer

linkedinmediumtwitter

See Algolia in action

Tailoring search to maximize click-through and conversion rates

See Algolia in actionView demo

Recommended Articles

Powered byAlgolia Algolia Recommend

Good API Documentation Is Not About Choosing the Right Tool
engineering

Maxime Locqueville

DX Engineering Manager

How Algolia uses AI to deliver smarter search
ai

Julien Lemoine

Co-founder & former CTO at Algolia

What is search relevance?
product

Jon Silvers

Director, Digital Marketing