We invited our friends at Starschema to write about an example of using Algolia in combination with MongoDB. We hope that you enjoy this four-part series by Full Stack Engineer Soma Osvay.
If you’d like to look back or skip ahead, here are the other links:
Part 1 – Use-case, architecture, and current challenges
Part 3 – Data pipeline implementation
Part 4 – Frontend implementation and conclusion
When we discussed the challenges of integrating a third-party indexing system into the product, our engineers instantly brought up three potential problems:
So far, we had one single source of truth database (the Listings database) where all the listings are stored. When introducing Algolia into the ecosystem, we have to prioritize keeping it up-to-date with that database. Any inconsistency between these systems can have a serious impact on our site’s UX. We wouldn’t want to end up with a situation where a search result:
All of these scenarios would result in a loss of confidence in our service and can directly be translated to a loss of revenue. It is absolutely essential that we both create an initial load of our existing dataset to Algolia and keep Algolia up-to-date with all the future changes in that dataset.
Our backend application is already under heavy load. It is scaled horizontally using Kubernetes, but we want to avoid greatly increasing the cost of operations due to higher traffic on our servers. When designing a solution, we have to offload as much traffic as possible to Algolia.
We also want to make sure that we don’t compromise our application’s security and access control. Currently, our application does not require a logged-in session to query listings, so this is not as important, but if somebody does log in, it would be nice to be able to store the user’s identity with Algolia so it can be used to personalize search results and refine our internal reports.
We can break this down into three tasks:
The updated architecture diagram would look like the following:
Let’s take a look at some of the advantages and disadvantages of the different paths we could take on task #2. Here are some of our options:
Whatever option we end up choosing, we’ll implement it in the third post of this series as well.
Lastly, part four will focus on creating a small web-based frontend to query the Algolia index. I want to be able to show our frontend developers a working solution with basic code so they can evaluate the time and effort required to integrate it into our existing frontend application.
To keep the implementation simple, I will use a dataset that is publicly available for MongoDB and is similar to my production data. There are multiple reasons behind this:
I decided to go with MongoDB’s official Sample AirBnB Listings Dataset, as it is fairly close to our existing data structure. I’m also going to use MongoDB Atlas to host my sample database as well as a free Algolia account to store the records. I might be an expert in Python already (which is why we’re using Jupyter notebooks), but I’m not in the frontend languages of HTML, CSS, and JavaScript, so this will be a great opportunity to test if Algolia’s SDKs are as simple as they’re made out to be.
In the first article of this series, I talked about our use-case, architecture and the search challenges we are facing.
In the third article of this series, I will implement the data ingestion into Algolia and figure out how to keep that data up-to-date.
In the fourth article of this series, I will implement a sample frontend so we can evaluate the product from the user’s perspective and give the developers a head-start if they choose to go with this option.
Soma Osvay
Full Stack Engineer, StarschemaPowered by Algolia AI Recommendations
Soma Osvay
Full Stack Engineer, StarschemaSoma Osvay
Full Stack Engineer, StarschemaJason Frueh
Co-owner and lead developer of Mycreativeshop