Search by Algolia
8 ways to use merchandising data to boost your online store ROI
e-commerce

8 ways to use merchandising data to boost your online store ROI

New year, new goals. Sounds positive, but looking at your sales data, your revenue and profit aren’t so hot ...

John Stewart

VP, Corporate Communications and Brand

Algolia DocSearch + Astro Starlight
engineering

Algolia DocSearch + Astro Starlight

What is Astro Starlight? If you're building a documentation site, your content needs to be easy to write and ...

Jaden Baptista

Technical Writer

What role does AI play in recommendation systems and engines?
ai

What role does AI play in recommendation systems and engines?

You put that in your cart. How about this cool thing to go with it? You liked that? Here are ...

Catherine Dee

Search and Discovery writer

How AI can help improve your user experience
ux

How AI can help improve your user experience

They say you get one chance to make a great first impression. With visual design on ecommerce web pages, this ...

Jon Silvers

Director, Digital Marketing

Keeping your Algolia search index up to date
product

Keeping your Algolia search index up to date

When creating your initial Algolia index, you may seed the index with an initial set of data. This is convenient ...

Jaden Baptista

Technical Writer

Merchandising in the AI era
e-commerce

Merchandising in the AI era

For merchandisers, every website visit is an opportunity to promote products to potential buyers. In the era of AI, incorporating ...

Tariq Khan

Director of Content Marketing

Debunking the most common AI myths
ai

Debunking the most common AI myths

ARTIFICIAL INTELLIGENCE CAN’T BE TRUSTED, shouts the headline on your social media newsfeed. Is that really true, or is ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How AI can benefit the retail industry
ai

How AI can benefit the retail industry

Artificial intelligence is on a roll. It’s strengthening healthcare diagnostics, taking on office grunt work, helping banks combat fraud ...

Catherine Dee

Search and Discovery writer

How ecommerce AI is reshaping business
e-commerce

How ecommerce AI is reshaping business

Like other modern phenomena such as social media, artificial intelligence has landed on the ecommerce industry scene with a giant ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

AI-driven smart merchandising: what it is and why your ecommerce store needs it
ai

AI-driven smart merchandising: what it is and why your ecommerce store needs it

Do you dream of having your own personal online shopper? Someone familiar and fun who pops up every time you ...

Catherine Dee

Search and Discovery writer

NRF 2024: A cocktail of inspiration and innovation
e-commerce

NRF 2024: A cocktail of inspiration and innovation

Retail’s big show, NRF 2024, once again brought together a wide spectrum of practitioners focused on innovation and transformation ...

Reshma Iyer

Director of Product Marketing, Ecommerce

How AI-powered personalization is transforming the user and customer experience
ai

How AI-powered personalization is transforming the user and customer experience

In a world of so many overwhelming choices for consumers, how can you best engage with the shoppers who visit ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show
algolia

Unveiling the future: Algolia’s AI revolution at NRF Retail Big Show

Get ready for an exhilarating journey into the future of retail as Algolia takes center stage at the NRF Retail ...

John Stewart

VP Corporate Marketing

How to master personalization with AI
ai

How to master personalization with AI

Picture ecommerce in its early days: businesses were just beginning to discover the power of personalized marketing. They’d divide ...

Ciprian Borodescu

AI Product Manager | On a mission to help people succeed through the use of AI

5 best practices for nailing the ecommerce virtual assistant user experience
ai

5 best practices for nailing the ecommerce virtual assistant user experience

“Hello there, how can I help you today?”, asks the virtual shopping assistant in the lower right-hand corner ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Best practices of conversion-focused ecommerce website design
e-commerce

Best practices of conversion-focused ecommerce website design

The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...

Catherine Dee

Search and Discovery writer

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion
e-commerce

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion

Imagine an online shopping experience designed to reflect your unique consumer needs and preferences — a digital world shaped completely around ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Here at Algolia, we store a lot of data. And at such large scales, that efficiency has become our number one concern — but how do we manage to fit so much input data into an index that we can search through in milliseconds?

We tackle this incredibly difficult problem with an unusually effective solution called tree search. It can be a little tough to understand at first, but it’ll be helpful if we use some more familiar analogies first before diving in.

The algorithm of euros and electrons

Imagine with me a manual coin sorter (here’s an advertisement for one, if you’d like a visual), where you can drop coins down onto a guided track. According to a variation of the dimensions of the track that the coins ride on, the coins are dropped into different containers by their size. This simple mechanism doesn’t require sophisticated electronics to read and understand what the coins represent, but rather it notes a simple distinction between them — their small size differences — and exploits it by forcing a sort of “decision” at several key points along the track. In effect, you can imagine the sorter “asking” the coins, “are you big enough to continue along this track, or should you be split off onto a new path?”

Lightning is another good example of this algorithm in practice, so let’s skim the surface of the science behind why lightning takes the shape it does. Take a look at this image of lightning below, keeping in mind that lightning is essentially a bunch of very excited electrons trying to get to a place where there are fewer electrons:

image of a lightning strike

While many of us see lightning quite often, it may have never crossed our minds just how bizarre this pattern is. Really, if lightning is just electrons trying to get from point A to point B as fast as possible, surely they’d just go in a straight line, right? Why does lightning take such sharp corners? Why does it split into branches?

To be honest, this is a developing science (and our science focuses on a different kind of cloud), but the general explanation is that lightning is not made all at once. The electrons jump only a few dozen meters downwards from the cloud, pause for a moment, and make a decision of sorts. Depending on the conductivity of the air around them (which is very irregular since “air is not a perfect mixture”), the electrons will “choose” to take a different direction for the next jump. It’s by this process that they leapfrog to the ground, twisting and even branching with each step. While it seems random, it’s for this reason that lightning is actually deterministic — if we could recreate the same air mixture on a scale roughly equivalent to the length of each step, we could theoretically recreate a lightning bolt that has already existed.

Coin sorters and lightning seem like wildly different phenomena, but if we strip away the details, they actually follow a similar recursive process:

  1. Some items travel down a track for some length.
  2. They analyze each item on the track by some specific facet, and decide whether it should branch onto a new path.
  3. For all of the new branches of the track, repeat step 1.

It’s a fairly simplistic algorithm, but that makes it extremely versatile. When it comes to lightning, the strength of the various bolts that hit the ground map out the conductivity of the air above them (the mystery “factor” of step 2). Coin sorters are actually built to exploit this effect, as the whole point of a coin sorter is that the number of coins in each bucket reflects the difference in the sizes of the coins that first entered the track.

How tree search works similarly

What if we had a sort of branching structure that worked like the air that the lightning travels through, or the track that the coins slide down? At every step down this node-based structure, we’d decide where to branch to in the next level by matching the nodes to a search query. Essentially, our analogue to air conductivity or coin size would be query relevancy. By reverse-engineering how we expect to traverse this tree, we can build it!

Let’s imagine our dataset is seven strings: romane, romanus, romulus, rubens, ruber, rubicon, and rubicundus. Our first attempt at building a tree out of this data would start with an r at the top rung of the tree, and then each branch down would spell out one of our searchable items letter-by-letter, as shown in the image below:

Tree search branch example image

Now we can designate each string as a sequence of moves at each junction! For example, our original string romulus can be described as the third option at the first branch, and the first option at the rest of the junctions — 311111. That’s only saving us one character per string though, as there’s one less junction than there are characters. Let’s see if we can optimize this by combining identical characters laterally, like what we did with the r in the first row:

Tree search branch example image with characters combined laterally

This hasn’t improved the query time as it takes the same amount of steps to traverse the tree, but it’s so much cleaner now! It’s much easier to comprehend, and it only takes up about 65% of the storage space as the previous version. It also opens up the possibility of combining nodes vertically:

Tree search branch example image with nodes combined vertically

This new, shorter tree contains the same amount of data as the previous one, but it’s far quicker to traverse — that same query romulus only requires us to go through two junctions as opposed to the previous six. Of course, this was a fairly contrived example, and there exist situations with multiple optimization routes, but the gist is here.

What tree search is actually used for

And this isn’t just a theoretical thing either! Tree search is a very well-known and well-used technology, and you probably use it every day without noticing. Formats like JPEG and MP3 include versions of a tree search (specifically a related technology called Huffman coding) for its compression benefits, and it’s probably what’s powering the search through your phone’s contact list. Another common example is Abstract Syntax Trees, a form of these tree structures that hold tokens of a programming language. A simple line of code like let x = 100; won’t just be run directly by an interpreter or compiler, since it doesn’t really know what to make of the string in its full form. Instead, it’ll break it into something like this:

let x = 100 visualized

Syntax details like the = and the ; are helpful for resolving ambiguity, but they don’t actually help with the code’s functionality. Once the code is converted to a clearly understood tree structure, those bits of syntax just aren’t necessary for the compiler or interpreter to traverse the tree and run or compile it one command at a time.

Here at Algolia, we get asked a lot how we make our search tools so incredibly fast. Really, doesn’t it seem a little unreasonable at first glance that we could sort through millions of data points — JSON objects, string content, numbers, URLs, booleans — and still find what the user is looking for in a matter of mere milliseconds? Well, we’re not magicians; we’re using tree search!

As mentioned earlier, our example with just seven similar strings was chosen specifically to demonstrate the power of this algorithm. In reality, a typical dataset that small is barely going to see any improvement from tree search, so it was difficult to create a practical dataset for this article. The real advantage comes as the dataset gets bigger, and it becomes likelier that information in new entries to the dataset is already contained in part within the tree. This actually happens perhaps sooner than you’d expect, for a couple of reasons, the most significant of which is that certain letters just don’t follow each other in sentence case. Characters that sit alongside each other as different branches, for example, don’t follow the expected equal distribution, and character sequences that can clump into one node vertically happen far more often than they would for randomly generated text.

As an aside, that actually allows us to correct typos fairly easily. It would be unlikely for a node containing W to be the parent of a node containing g, so the user likely meant either the most common character that appears after W in our existing tree, or a character that happens to be nearby on a keyboard, criteria which will often converge on h. And this more confident the larger the tree becomes.

The average branching off point of new additions to the tree gets further and further down the structure, and we’ll start to need less new storage space per character of new entries. That means Algolia’s search indexes (which are essentially just massive tree structures) become more and more performant as the dataset grows. Some of the most well-known Algolia customers are building incredibly large indexes for this reason (think the guides in GitLab’s docs, or the recipes published by King Arthur Baking, or GoFundMe’s fundraisers, or details about Gucci products). This is all to say that the search speed and the size of those massive datasets don’t just scale linearly as they add more information, but actually tend to somewhere — if you had the time and the patience, you could theoretically calculate the limit of the storage space and search speed of Gucci’s product database as the record count approaches infinity!

Perhaps we should leave it at this for now, since we’d probably end up down a very deep rabbit-hole if we continued with that train of thought. But if you found this explanation interesting, here are some takeaways, links to further research, and questions to ponder:

  • You use tree search daily. They’re probably most often implemented as Huffman trees, or if you’re a programmer, as ASTs.
  • Just like electrons in lightning step through the sky and take the most conductive path available, our tree search algorithm is essentially just our search query stepping down the tree and finding the closest matches available. What other natural phenomena work this way?
  • Optimizing the tree’s size and optimizing the speed of searching through it are two different things: to reduce tree size, we’ll combine equivalent nodes laterally, but to reduce search time, we’ll combine sequential, single-child nodes vertically. How might those two processes interact with each other?
  • This general algorithm is at the heart of Algolia’s unmatched performance with large datasets. If you’re interested in testing it out on a smaller scale, our free tier is large enough to start feeling the benefits of tree search’s optimizations.

We hope that you enjoyed this deep-dive into tree search, and as always, feel free to reach out on Twitter if you want to keep the conversation going.

About the authors
Paul-Louis Nech

Senior ML Engineer

githublinkedintwitter
Xavier Roche

Recommended Articles

Powered byAlgolia Algolia Recommend

Inside the Algolia Engine Part 2 — The Indexing Challenge of Instant Search
engineering

Julien Lemoine

Co-founder & former CTO at Algolia

Algolia's top 10 tips to achieve highly relevant search results
product

Julien Lemoine

Co-founder & former CTO at Algolia

4 questions to ask for relevant search results
product

Jaden Baptista

Technical Writer