Search by Algolia
How personalization boosts customer engagement
e-commerce

How personalization boosts customer engagement

You land on your favorite retailer’s website, where everything seems to be attractively arranged just for you. Your favorite ...

Jon Silvers

Director, Digital Marketing

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?
e-commerce

What is retail analytics and how can it inform your data-driven ecommerce merchandising strategy?

There is such tremendous activity both on and off of retailer websites today that it would be impossible to make ...

Catherine Dee

Search and Discovery writer

8 ways to use merchandising data to boost your online store ROI
e-commerce

8 ways to use merchandising data to boost your online store ROI

New year, new goals. Sounds positive, but looking at your sales data, your revenue and profit aren’t so hot ...

John Stewart

VP, Corporate Communications and Brand

Algolia DocSearch + Astro Starlight
engineering

Algolia DocSearch + Astro Starlight

What is Astro Starlight? If you're building a documentation site, your content needs to be easy to write and ...

Jaden Baptista

Technical Writer

What role does AI play in recommendation systems and engines?
ai

What role does AI play in recommendation systems and engines?

You put that in your cart. How about this cool thing to go with it? You liked that? Here are ...

Catherine Dee

Search and Discovery writer

How AI can help improve your user experience
ux

How AI can help improve your user experience

They say you get one chance to make a great first impression. With visual design on ecommerce web pages, this ...

Jon Silvers

Director, Digital Marketing

Keeping your Algolia search index up to date
product

Keeping your Algolia search index up to date

When creating your initial Algolia index, you may seed the index with an initial set of data. This is convenient ...

Jaden Baptista

Technical Writer

Merchandising in the AI era
e-commerce

Merchandising in the AI era

For merchandisers, every website visit is an opportunity to promote products to potential buyers. In the era of AI, incorporating ...

Tariq Khan

Director of Content Marketing

Debunking the most common AI myths
ai

Debunking the most common AI myths

ARTIFICIAL INTELLIGENCE CAN’T BE TRUSTED, shouts the headline on your social media newsfeed. Is that really true, or is ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How AI can benefit the retail industry
ai

How AI can benefit the retail industry

Artificial intelligence is on a roll. It’s strengthening healthcare diagnostics, taking on office grunt work, helping banks combat fraud ...

Catherine Dee

Search and Discovery writer

How ecommerce AI is reshaping business
e-commerce

How ecommerce AI is reshaping business

Like other modern phenomena such as social media, artificial intelligence has landed on the ecommerce industry scene with a giant ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

AI-driven smart merchandising: what it is and why your ecommerce store needs it
ai

AI-driven smart merchandising: what it is and why your ecommerce store needs it

Do you dream of having your own personal online shopper? Someone familiar and fun who pops up every time you ...

Catherine Dee

Search and Discovery writer

NRF 2024: A cocktail of inspiration and innovation
e-commerce

NRF 2024: A cocktail of inspiration and innovation

Retail’s big show, NRF 2024, once again brought together a wide spectrum of practitioners focused on innovation and transformation ...

Reshma Iyer

Director of Product Marketing, Ecommerce

How AI-powered personalization is transforming the user and customer experience
ai

How AI-powered personalization is transforming the user and customer experience

In a world of so many overwhelming choices for consumers, how can you best engage with the shoppers who visit ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show
algolia

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show

Get ready for an exhilarating journey into the future of retail as Algolia takes center stage at the NRF Retail ...

John Stewart

VP Corporate Marketing

How to master personalization with AI
ai

How to master personalization with AI

Picture ecommerce in its early days: businesses were just beginning to discover the power of personalized marketing. They’d divide ...

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

5 best practices for nailing the ecommerce virtual assistant user experience
ai

5 best practices for nailing the ecommerce virtual assistant user experience

“Hello there, how can I help you today?”, asks the virtual shopping assistant in the lower right-hand corner ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

As you may already know, we love having great search on documentations we use daily. That’s the main reason we built DocSearch.

DocSearch is live on hundreds of documentation sites, including our own. We believe DocSearch is the easiest and fastest way to navigate technical documentation. Because of that, we invest a lot of time in making it better and better by improving it on a daily basis.

Improving your DocSearch

Once you setup DocSearch, there are three main areas for improvement:

  • The structure
  • The content
  • The search itself (indexing and/or querying)

The list is ordered by importance, meaning that if you find a relevance issue on a DocSearch implementation it’s usually due to either the structure or the content. Then, in very few cases, it’s due to the search itself.

The camelCase issue

We just came across one of those search issues: camelCased words.
Camel case is the practice of writing compound words or phrases such that each word or abbreviation in the middle of the phrase begins with a capital letter

If you are an Algolia user you know that all our api parameters are camel cased: see our list of parameters.

Search for parameters is working but we found it far from perfect.
Let me explain why.

Let’s take, for example, one of the parameters from our doc: snippetEllipsisText

It’s 1 word but you understand 3 different words: “snippet Ellipsis Text”

Looking at it split up, it makes sense to expect the search engine to be able to return results for the following queries:

  • “snippetEllipsisText” (original name)
  • “snippet Ellipsis Text” (split name)

But also:

  • “Ellipsis” (middle word only)
  • “EllipsisText” (two last words, not split)
  • “EllipsisTex” (prefix query of “EllipsisText”)
  • “Ellipsis Text” (two last words split up)
  • “Ellipsis snippet” (split up, inverted first and second word)

There is a few queries where you are not expecting results:

  • “EllipsisSnippet” (not split inverted first and second word)
  • “TextEllipsis” (not split inverted second and third word)

In plain words we want to match:

  • The exact parameter name (Because people might copy/paste it from their code to know more)
  • Any combination of sub-word of the parameter name split up
  • Exact parameter name omitting 1 or more starting sub-words

One of the great features of Algolia is the highlighting. We describe in detail how it works in a previous blog post.

So we also expect, when searching for camel case, to have highlighting working correctly, meaning that if I search “ellip” I expect to see “snippetEllipsisText” in the result

For now we were handling only:

  • “snippetEllipsisText” (the basic one)
  • “snippet Ellipsis Text” because the engine tries to concatenate the query.

There will be a few search inputs like the one just bellow along the blog for you to try and understand the process. Those inputs will search inside all Algolia parameters (at the time of the writing)

Working queries: “snippetEllipsisText”, “snippet Ellipsis Text”,
Not working queries: “Ellipsis”, “EllipsisText”, “EllipsisTex”, “Ellipsis Text”, “Ellipsis snippet”

As you can see from the examples above, that’s 2 out of 7 working, which we can agree is bad.

Why we get those results

Understanding why we are handling so few queries out of the box is the key to fixing it properly – let’s dive in. Algolia is doing prefix matches only (more details in this article). It’s one of the reasons Algolia is able to search so fast, but for our camel case use case it’s preventing us from searching in the middle of the word. So we had to find a way around that.

The iterative process to fix it

Indexing the splitted content

Since we want to be able to search the middle of our camelCaseWords we knew we had to index it as “camel Case Word” so basically “uncamelizing” the content.

So we started to look for existing librairies doing that (in python because the DocSearch scraper is built with python.

We found the stringcase library which has a sentencecase function wich does the job of “uncamelizing” but there is two issues with such library:

  • It’s working too well :), what I mean by that is it’s going to uncamelize everything, like “API client” is going to become “A P I client”, we don’t want that to happen as the brains reads and understand it as “API client” not “A P I client”
  • A camelCasedWord in the context of a documentation is usually surrounded by text and it’s not allowing us to know which words got uncamelized in the process (more on why we need that information bellow)

So we had to write our own:

def _uncamelize_word(word):
  s = ""
  for i in xrange(0, len(word)):
    if i > 0 and word[i].isupper() and \
        word[i - 1].isalnum() is True and \
        not word[i - 1].isupper()
       s += " "
    s += word[i]
  if s != word:
    pass # the word was uncamelized

def uncamelize_string(string):
  return ' '.join([uncamelize_word(word) for word in string.split()])

if a letter is preceded by an non-capital alphanumeric character we add a space, fairly simple.

With this in place:

  • “snippet Ellipsis Text” gives the expected results
  • we can now search in the middle of the camelCasedWord

But:

  • we now have a display issue when looking for “snippet Ellipsis Text”
  • “snippetEllipsisText” is not returning results anymore
  • we are still not able to have results for “EllipsisText”
  • we can know exactly which word in a sentence was camel cased

Working queries: “Ellipsis”, “EllipsisText”, “Ellipsis Text”, “Ellipsis snippet”, “snippet Ellipsis Text”
Not working queries: “snippetEllipsisText”, “EllipsisTex”

That’s 5 out of 7 working, better but still not perfect

Fixing the remaining issues

The display issue

As mentioned, we now have a display issue. The content we show on the search result for the query “snippet Ellipsis Text” is not the one that you can see in the content and expect in the search result: “snippetEllipsisText”.

We came up with a nice trick. We looked for an invisible unicode character: \u2063 (there are others but this one does the job) to put as a replacement for the space. This make the engine still considering snippetEllipsisText as several words while displaying snippetEllipsisText because the separator is not visible in a browser.

The _uncamelize_word function code now looks like:

def _uncamelize_word(word):
  s = ""
  for i in xrange(0, len(word)):
    if i > 0 and word[i].isupper() \
        and not word[i - 1].isupper() \
        and word[i - 1].isalnum() is True:
      s += u"\u2063"
    s += word[i]
  if s != word:
    pass # the word was uncamelized

Last but not least: the no result issue for “snippetEllipsisText” and “EllipsisT”

Searching for “snippetEllipsisText” does not bring any result anymore since the index does not contains anymore the word snippetEllipsisText.

Searching for “EllipsisTex” does not work because the word “EllipsisText” is not indexed, we indexed “Ellipsis” and “Text” but not “EllipsisText”.

Note that EllipsisText is returning the expected result because it’s one typo away from “Ellipsis Text”, same for “EllipsisT”. It’s better but we would rather have the engine considering it as 0 typo

Fortunately the Algolia engine has a handy synonym feature.

First thing first: “snippetEllipsisText”
We can just add a 1 way synonym:
snippetEllipsisText => snippet Ellipsis Text

Then for “EllipsisT” in the end what we want is to have another 1 way synonym:
EllipsisText => Ellipsis Text

But we need this to be generic. If we summarize we want to:

  • create a synonym for the complete name,
  • remove the first sub-word and creating a new synonym
  • iterate until only 1 sub-word remains.

The following schema should help you understand:

Let’s consider “snippetEllipsisText” as “A B C”, we are going to create the following 1 way synonyms:
ABC => A B C
BC => B C
C => C we actually don’t need this one as it’s already handled by the initial splitting

You can have a look at the final code here.

Final result:

Handling camel case seemed like an easy thing, but after having to handle it I can fairly say it’s not that simple after all, because it implies a lot of edge cases. The work we did here is improving a lot the search for parameters in our doc, and the search for all already live DocSearch implementations.

One area where DocSearch doesn’t shine yet is searching in generated api documentation from code like JavaDoc where camel case is omnipresent. This work is is a big step forward it making it available.



About the author
Maxime Locqueville

DX Engineering Manager

github

Algolia documentation

It's extensive, clear, and, of course, searchable.

Read the docs
Algolia documentation

Recommended Articles

Powered byAlgolia Algolia Recommend

Inside the Algolia Engine Part 3 — Query Processing
engineering

Julien Lemoine

Co-founder & former CTO at Algolia

Handling Natural Languages in Search
engineering

Léo Ercolanelli

Software Engineer

Algolia's top 10 tips to achieve highly relevant search results
product

Julien Lemoine

Co-founder & former CTO at Algolia