Search by Algolia
What is B2B ecommerce? Everything you need to know
e-commerce

What is B2B ecommerce? Everything you need to know

When you think of “customer experience,” what comes to mind? People, right? Specifically, consumers. Retail customers. That’s easy; the ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is ecommerce merchandising? Key components and best practices
e-commerce

What is ecommerce merchandising? Key components and best practices

A potential customer is about to land on the home page of your ecommerce platform, curious to see what cool ...

Catherine Dee

Search and Discovery writer

AI-powered search: From keywords to conversations
ai

AI-powered search: From keywords to conversations

By now, everyone’s had the opportunity to experiment with AI tools like ChatGPT or Midjourney and ponder their inner ...

Chris Stevenson

Director, Product Marketing

Vector vs Keyword Search: Why You Should Care
ai

Vector vs Keyword Search: Why You Should Care

Search has been around for a while, to the point that it is now considered a standard requirement in many ...

Nicolas Fiorini

Senior Machine Learning Engineer

What is AI-powered site search?
ai

What is AI-powered site search?

With the advent of artificial intelligence (AI) technologies enabling services such as Alexa, Google search, and self-driving cars, the ...

John Stewart

VP Corporate Marketing

What is a B2B marketplace?
e-commerce

What is a B2B marketplace?

It’s no secret that B2B (business-to-business) transactions have largely migrated online. According to Gartner, by 2025, 80 ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

3 strategies for B2B ecommerce growth: key takeaways from B2B Online - Chicago
e-commerce

3 strategies for B2B ecommerce growth: key takeaways from B2B Online - Chicago

Twice a year, B2B Online brings together industry leaders to discuss the trends affecting the B2B ecommerce industry. At the ...

Elena Moravec

Director of Product Marketing & Strategy

Deconstructing smart digital merchandising
e-commerce

Deconstructing smart digital merchandising

This is Part 2 of a series that dives into the transformational journey made by digital merchandising to drive positive ...

Benoit Reulier
Reshma Iyer

Benoit Reulier &

Reshma Iyer

The death of traditional shopping: How AI-powered conversational commerce changes everything
ai

The death of traditional shopping: How AI-powered conversational commerce changes everything

Get ready for the ride: online shopping is about to be completely upended by AI. Over the past few years ...

Aayush Iyer

Director, User Experience & UI Platform

What is B2C ecommerce? Models, examples, and definitions
e-commerce

What is B2C ecommerce? Models, examples, and definitions

Remember life before online shopping? When you had to actually leave the house for a brick-and-mortar store to ...

Catherine Dee

Search and Discovery writer

What are marketplace platforms and software? Why are they important?
e-commerce

What are marketplace platforms and software? Why are they important?

If you imagine pushing a virtual shopping cart down the aisles of an online store, or browsing items in an ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is an online marketplace?
e-commerce

What is an online marketplace?

Remember the world before the convenience of online commerce? Before the pandemic, before the proliferation of ecommerce sites, when the ...

Catherine Dee

Search and Discovery writer

10 ways AI is transforming ecommerce
e-commerce

10 ways AI is transforming ecommerce

Artificial intelligence (AI) is no longer just the stuff of scary futuristic movies; it’s recently burst into the headlines ...

Catherine Dee

Search and Discovery writer

AI as a Service (AIaaS) in the era of "buy not build"
ai

AI as a Service (AIaaS) in the era of "buy not build"

Imagine you are the CTO of a company that has just undergone a massive decade long digital transformation. You’ve ...

Sean Mullaney

CTO @Algolia

By the numbers: the ROI of keyword and AI site search for digital commerce
product

By the numbers: the ROI of keyword and AI site search for digital commerce

Did you know that the tiny search bar at the top of many ecommerce sites can offer an outsized return ...

Jon Silvers

Director, Digital Marketing

Using pre-trained AI algorithms to solve the cold start problem
ai

Using pre-trained AI algorithms to solve the cold start problem

Artificial intelligence (AI) has quickly moved from hot topic to everyday life. Now, ecommerce businesses are beginning to clearly see ...

Etienne Martin

VP of Product

Introducing Algolia NeuralSearch
product

Introducing Algolia NeuralSearch

We couldn’t be more excited to announce the availability of our breakthrough product, Algolia NeuralSearch. The world has stepped ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

AI is eating ecommerce
ai

AI is eating ecommerce

The ecommerce industry has experienced steady and reliable growth over the last 20 years (albeit interrupted briefly by a global ...

Sean Mullaney

CTO @Algolia

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Software architecture is a hard problem, and software architecture that scales is even harder. We’d all like to live in a world where problems of scale don’t creep into our code and can be kept in the realm of infrastructure, and to a certain extent they can. But eventually as developers we do have to think about these problems and find creative architectural ways to solve them.

The problem of scale that I’d like to focus on in this post is query response time. It’s a problem that many team’s just accept as a painful reality of a growing database and increasingly complex queries which drive rich UI functionality. In reality, this is a solvable problem, and one that can be done elegantly using an existing architectural pattern; CQRS.

As an example, we’re going to create a fictional user forum named “SlackerNews”. We’re going to fill it with auto-generated post data and benchmark some common queries and then look at replacing some of the heaviest queries using an Algolia search index to show how we can both improve search UX and also reduce database load and cost across the board.

Introducing CQRS?

Command Query Responsibility Segregation is a data modeling and system design pattern which basically says that you can use different models for writing data than you do from reading data. Instead of reading and writing to the same database model, you can write to one database, then copy a version of that data into a separate and different type of data storage system, and use that to do your queries instead. Usually this second data store stores a de-normalized version of the data which is more read performant than a relational database will be.

CQRS will increase the complexity of your system, however, if done right, and used strategically for the right types of reads, the benefits can be huge. We’re going to look at how we can apply CQRS to a forum database to drastically improve text search query response times, improve UX, and decrease load on your source of truth database.

You will be familiar with this pattern because of how we generally design object storage. Generally we don’t store large binary files such as images, music, or videos in a database, opting instead to put them in an object storage medium where the cost for storing that data is lower, and the impact on the main database is minimal. The only difference here is that we store the data in both the relational database and the search index, and we treat the relational database as the source of truth.

Building the Slackernews Database

Before we get to the query benchmarking and optimization, we have to generate some dummy data in our database. For this example, I’m going to use PostgreSQL as the database, as it’s one of the most performant and popular RDBMS in use today, and also because it has some great out-of-the-box text search optimization features.

The project is broken down into two parts:

1. Data generation
2. Query benchmarking

both of which can be found in this Query Performance GitLab project.

Data Model

Here’s a class diagram for the very simplified forum database we’re building. It should be relatively easy to follow. The main takeaways for our queries are;

1. Forums can have sub-forums via the ParentID foreign key
2. Tags and Posts have a many to many relationship which we achieve through the use of a bridging table PostTag
3. Through a recursive query (that we’ll show later on) we can fetch all posts in all child forums underneath any given parent forum.

Data Generation

The goal is to generate 1 million records which will be saved into Postgres in a normalized format using the tables and relationships which represent the data model above. The same data will also be denormalized and sent to an Algolia index.

To generate the data itself we’ll be using some sample posts from HackerNews, and then using Markov chains to generate entirely new random posts from that sample data. The code for this can be found in this Data Generator GitLab project.

Query Benchmarking

To test the performance of both our Postgres database and our Algolia index, I’ve written two AWS Lambda functions; Orchestrator, and Runner.

The orchestrator takes some information about our test scenario and will delegate the execution to the runners using a fan-out pattern. The runner will take a batch of queries and execute them against either the database or Algolia index. Each runner will calculate the average query time for its batch and return that to the orchestrator. The orchestrator will collect all of the averages from the runners and calculate a total average query execution time for that benchmark.

The orchestrator is called with the following arguments:

– Batch Size
– Total Batches
– Runner Type (DB or Algolia)
– Complexity (Simple or Complex)

Benchmarking

We want to make sure we’re being fair to both of the data stores and that we’re playing to their strengths so we’re going to do a few things to ensure that that’s the case. For the Postgres database, we’re going to be using Amazon RDS Aurora postgresql with a master instance and a read replica instance both using db.r5.xlarge instance types (4 vCPU, 32GB Memory).

When creating the records in the database we also pre-calculate a tsvector (Text Search Vector) for the body of each post and save it against the database records and then search for specific text tokens in the records using tsquery in our WHERE clause. We also create a gin index to make this text search more efficient.

On the Algolia side, we also denormalize most of the related information onto the index records. For example, we will be searching a parent forum for all posts in all child forums that match the search text so for each post record on the index, we also add all parent forum IDs to the _tags attribute. We will also be adding the actual forum tags to the _tags attribute as well.

The Queries

We create two queries, one simple and one complex. The simple query doesn’t use any joins and just does a text search on the body of the posts.

In each complex query we randomly select a forum that we want to filter by. Forums that have more sub-forums will be a bit slower to search through but we will run this multiple times to get a good reading of the average query response times for each approach.

Simple Query

select * from posts
where body_tokens @@ to_tsquery(‘git’)
order by created_at desc
limit 100

Complex Query

with recursive subforums as (
select
id,
name,
parent_id
from forums
where ID = @id
union
select
f.id,
f.name,
f.parent_id
from forums f
inner join subforums sf on sf.ID = f.parent_id
)
select p.*
from subforums sf
join posts p on p.forum_id = sf.id
where p.body_tokens @@ to_tsquery(‘git’)
order by p.created_at desc
limit 100

The Algolia search versions of these queries are much simpler due to the nature of searching and Algolia’s SDK

Simple Search

params = []interface{}{
opt.HitsPerPage(100),
}

postIndex.Search(“git”, params)

Complex Search

params = []interface{}{
opt.TagFilter(fmt.Sprintf(“%v”, (*forums)[forumToUse].ID)),
opt.HitsPerPage(100),
}

postIndex.Search(“git”, params)

Test Plan & Results

In this table you can see each benchmark and the 4 arguments to the orchestrator function. The average query time was established from taking the top 3 results from 10 runs. The reason for this approach was that both the Postgres database and the Algolia search would occasionally throw outlier results (longer response times) on the first run of any given query but would quickly stabilize after that and return consistent results.

This chart shows you a plot of each test for each Data Store layered on top of each other. I used a Logarithmic scale for the y-axis to demonstrate that both data stores do get slower as the concurrency goes up. As you can see however the response time quickly increases in the Postgres database whereas the Algolia search doesn’t really experience much of a slowdown at all. I also included a version of the chart without the log scale so you can visualize the real difference in response time.

Interestingly the complexity of the search didn’t have any impact in these particular benchmarks, however I suspect this was because the cardinality of posts to forums, tags and users was actually very low in our generated data. If I were to run these tests again, I would generate hundreds or thousands of each to get more realistic results from the increase in rows that needed to be joined.

Layered results with Log scale for response time
Layered results with Log scale for response time
Layered results with a linear scale for response time
Layered results with a linear scale for response time

In this second chart you can see that for Postgres we experience very large response times (~7.5 seconds) as concurrency increases to 1000, whereas Algolia scales easily to that level of search concurrency with response times orders of magnitude lower at ~15ms (0.015 seconds).

If anyone would like to sponsor me to run these benchmarks at much larger scale I’d be more than willing to oblige.

Takeaways

We as developers have a lot of power in the data storage choices we make as we build software. We can and should be employing the CQRS pattern when it makes sense and think out of the box when it comes to storing and querying our data, especially as our products scale and as customers hit our traditional databases harder. By doing so we can drastically improve the user experience and the underlying database costs associated with searching our data. The lift to incorporate a search data store like Algolia into your tech stack is relatively low for the extraordinary quality of life improvement it can introduce.

I hope that this adventure into data storage and querying has been informative and helpful and I look forward to reading your feedback on my approach and anything I may have missed in my benchmarking method.

About the author
Daniel Parker

Senior Integrations Engineer @ GitLab

githubtwitter

Recommended Articles

Powered byAlgolia Algolia Recommend

Algolia's top 10 tips to achieve highly relevant search results
product

Julien Lemoine

Co-founder & former CTO at Algolia

Inside the Algolia Engine Part 2 — The Indexing Challenge of Instant Search
engineering

Julien Lemoine

Co-founder & former CTO at Algolia

Query understanding 101
product

Hamish Ogilvy

VP, Artificial Intelligence