Search by Algolia
How to increase your ecommerce conversion rate in 2024
e-commerce

How to increase your ecommerce conversion rate in 2024

2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How does a vector database work? A quick tutorial
ai

How does a vector database work? A quick tutorial

What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...

Catherine Dee

Search and Discovery writer

Removing outliers for A/B search tests
engineering

Removing outliers for A/B search tests

How do you measure the success of a new feature? How do you test the impact? There are different ways ...

Christopher Hawke

Senior Software Engineer

Easily integrate Algolia into native apps with FlutterFlow
engineering

Easily integrate Algolia into native apps with FlutterFlow

Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...

Chuck Meyer

Sr. Developer Relations Engineer

Algolia's search propels 1,000s of retailers to Black Friday success
e-commerce

Algolia's search propels 1,000s of retailers to Black Friday success

In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

Generative AI’s impact on the ecommerce industry
ai

Generative AI’s impact on the ecommerce industry

When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What’s the average ecommerce conversion rate and how does yours compare?
e-commerce

What’s the average ecommerce conversion rate and how does yours compare?

Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What are AI chatbots, how do they work, and how have they impacted ecommerce?
ai

What are AI chatbots, how do they work, and how have they impacted ecommerce?

“Hello, how can I help you today?”  This has to be the most tired, but nevertheless tried-and-true ...

Catherine Dee

Search and Discovery writer

Algolia named a leader in IDC MarketScape
algolia

Algolia named a leader in IDC MarketScape

We are proud to announce that Algolia was named a leader in the IDC Marketscape in the Worldwide General-Purpose ...

John Stewart

VP Corporate Marketing

Mastering the channel shift: How leading distributors provide excellent online buying experiences
e-commerce

Mastering the channel shift: How leading distributors provide excellent online buying experiences

Twice a year, B2B Online brings together America’s leading manufacturers and distributors to uncover learnings and industry trends. This ...

Jack Moberger

Director, Sales Enablement & B2B Practice Leader

Large language models (LLMs) vs generative AI: what’s the difference?
ai

Large language models (LLMs) vs generative AI: what’s the difference?

Generative AI and large language models (LLMs). These two cutting-edge AI technologies sound like totally different, incomparable things. One ...

Catherine Dee

Search and Discovery writer

What is generative AI and how does it work?
ai

What is generative AI and how does it work?

ChatGPT, Bing, Bard, YouChat, DALL-E, Jasper…chances are good you’re leveraging some version of generative artificial intelligence on ...

Catherine Dee

Search and Discovery writer

Feature Spotlight: Query Suggestions
product

Feature Spotlight: Query Suggestions

Your users are spoiled. They’re used to Google’s refined and convenient search interface, so they have high expectations ...

Jaden Baptista

Technical Writer

What does it take to build and train a large language model? An introduction
ai

What does it take to build and train a large language model? An introduction

Imagine if, as your final exam for a computer science class, you had to create a real-world large language ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

The pros and cons of AI language models
ai

The pros and cons of AI language models

What do you think of the OpenAI ChatGPT app and AI language models? There’s lots going on: GPT-3 ...

Catherine Dee

Search and Discovery writer

How AI is transforming merchandising from reactive to proactive
e-commerce

How AI is transforming merchandising from reactive to proactive

In the fast-paced and dynamic realm of digital merchandising, being reactive to customer trends has been the norm. In ...

Lorna Rivera

Staff User Researcher

Top examples of some of the best large language models out there
ai

Top examples of some of the best large language models out there

You’re at a dinner party when the conversation takes a computer-science-y turn. Have you tried ChatGPT? What ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What are large language models?
ai

What are large language models?

It’s the era of Big Data, and super-sized language models are the latest stars. When it comes to ...

Catherine Dee

Search and Discovery writer

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

We invited our friends at Starschema to write about an example of using Algolia in combination with MongoDB. We hope that you enjoy this four-part series by Full Stack Engineer Soma Osvay.

If you’d like to look back or skip ahead, here are the other links:

Part 1 – Use-case, architecture, and current challenges

Part 2 – Proposed solution and design

Part 4 – Frontend implementation and conclusion


Just a note before I get started: you can follow along with the implementation here.

In the last article, we analyzed our data pipeline architecture and left an open question about the way that we will run our Python scripts to load the Algolia index. There were 3 options:

  1. Write Python scripts embedded in our ETL processes to update the Algolia index and MongoDB at the same time.
  2. Host Python scripts that pull data from Mongo to Algolia completely independently from our existing ETL workflow.
  3. Use MongoDB Triggers & Functions to update the Algolia index right after MongoDB updates.

After discussions with our engineering team, I decided to go with the first option, because we already have an established and sophisticated way of running our current data preparation pipeline with a lot of existing scripts to clean, aggregate, and format our data before we load it into our database. Adding an extra script here won’t take much effort, and all the maintenance and monitoring tools are readily available. After deciding on the architecture steps, I decided to make a single script that both performs the initial data load into Algolia and keeps the index up-to-date, instead of a script for each of those actions.

Thankfully, Algolia supports this kind of use-case by exposing a replace_all_objects method that actually creates a new temporary index first and then swaps it out with the live one once it’s done building. That makes for a near-instant transition between the old and the refreshed index without any downtime or data inconsistency.

Step 0. Planning

Before starting to implement my Python script, I had to register for a free Algolia account and create a sample dataset that I can use to fill my index using MongoDB Atlas.

I chose to go with the default AirBnB dataset that comes with Atlas out-of-the-box, because the format and use-case is very similar to my real-life data. I also made the sample dataset publicly hosted for anybody who is following along or would like to experiment:

  • Host: algolialistingstest.vswcm0y.mongodb.net
  • Username: ReadOnly
  • Password: AlgoliaTest
  • Database: sample_airbnb
  • Collection: listingsAndReviews

I decided to implement the script using a Jupyter Notebook, because it lets me test pieces of my code independently, annotate my code with Markdown, play around and model the data structure iteratively, and export the created Python code as a script file easily. It’s very versatile and interactive, and I generally love to use it. I’m hosting it on Google Collab, so I can share the code very easily without anybody having to install an on-premise Jupyter environment. You can find the implemented script here. We’re using the implemented script to:

  1. Connect to Algolia using the Algolia Python API and validate the connection.
  2. Connect to the MongoDB instance and retrieve sample data.
  3. Prepare the Algolia index.
  4. Load the dataset into Algolia from the MongoDB instance and replace the existing index.

Step 1. Connect to Algolia

The first step is generating an API key:

  1. Register for a free Algolia account, or log in to your existing account.
  2. After signing in, an Algolia application will automatically be created for you. You can either use the default unnamed application or create a new application.
  3. Go to your API Keys section of your application and retrieve your Application ID and Admin API Key. You will need to use both of them when connecting your Algolia account from the Python code below.

We’ll need to install the Algolia Python client first, but afterwards, here’s what our connection code looks like:

# The Application ID of your Algolia Application
algolia_app_id = "[your_algolia_app_id_here]"
# The Admin API Key of your Algolia Application
algolia_admin_key = "[your_algolia_admin_key_here]"

# Define the Algolia Client and Index that we will use for this test
from algoliasearch.search_client import SearchClient

algolia_client = SearchClient.create(algolia_app_id, algolia_admin_key)
algolia_index = algolia_client.init_index("test_index")

# Test the index that we just created. We do this as part of the function, because these variables are not needed later
def test_algolia_index(index):
    # Clear the index, in case it contains any records
    index.clear_objects()
    # Create a sample record
    record = {"objectID": 1, "name": "test_record"}
    # Save it to the index
    index.save_object(record).wait()
    # Search the index for 'test_record'
    search = index.search("test_record")
    # Clear all items again to clear our test record
    index.clear_objects()
    # Verify that the first hit is our object
    if len(search["hits"]) == 1 and search["hits"][0]["objectID"] == "1":
        print("Algolia index test successful")
    else:
        raise Exception("Algolia test failed")

# Call our test function
test_algolia_index(algolia_index)

Step 2. Connect to Mongo and get data

First, install PyMongo, a Python MongoDB client, and then use this code to connect to our sample MongoDB database and read the sample data. Note that we’re only getting 5000 items so that we don’t overwhelm our free tier usage:

# Define MongoDB connection parameters. These are wrapped in a function to keep the global namespace clean
# Change these values if you are not running your own MongoDB instance
db_host = "algolialistingstest.vswcm0y.mongodb.net"
db_name = "sample_airbnb"
db_user = "ReadOnly"
db_password = "AlgoliaTest"
collection_name = "listingsAndReviews"

connection_string = f"mongodb+srv://{db_user}:{db_password}@{db_host}/{db_name}?retryWrites=true&w=majority"

# Connect to MongoDB and get the MongoDB Database and Collection instances
from pymongo import MongoClient

# Create MongoDB Client
mongo_client = MongoClient(connection_string)
# Get database instance
mongo_database = mongo_client[db_name]
# Get collection instance
mongo_collection = mongo_database[collection_name]
# Retrieve the first 5000 records from collection items
mongo_query = mongo_collection.find()
initial_items = []
for item in mongo_query:
    if len(initial_items) < 5000:
        initial_items.append(item)

Step 3. Transform our data into a form that suits Algolia

The objects in our MongoDB sample dataset contain many attributes, some of which are irrelevant to our Algolia index. We only keep those that are required either for searching or ranking.

  • The _id property will be kept, as it will be the Algolia object ID as well.
  • These properties will be kept either for searching, faceting, or displaying: name, space, description, neighborhood_overview, transit, property_type, address, accommodates, bedrooms, beds, number_of_reviews, bathrooms, price, weekly_price, security_deposit, cleaning_fee, images.
  • The review_scores attribute on the Airbnb entry will be transformed to a scores property, which will contain the number of stars that is given to the listing.
  • A _geoloc property will be added to the object based on fields in the original address object. This will be used for geosearching.
  • The following properties will be stripped completely since Algolia doesn’t need them: summary, listings_url, notes, access, interaction, house_rules, room_type, bed_type, minimum_nights, maximum_nights, cancellation_policy, last_scraped, calendar_last_scraped, first_review, last_review, amenities, extra_people, guests_included, host, availability, review_scores, reviews.

Here is this transformation code:

# We define a function first that is able to strip long texts longer than 350 characters. This is done because the sample data has some records with very long descriptions, which is irrelevant to our use-case and takes up a lot of space to display
def strip_long_text(obj, trailWithDot):
    if isinstance(obj, str):
        # Strip texts longer than 350 characters after the next full stop (.)
        ret = obj[:350].rsplit(".", 1)[0]
        if trailWithDot and len(ret) > 0 and not ret.endswith("."):
            ret = "."
        return ret
    else:
        return obj

# We define a function to validate number values coming from MongoDB. MongoDB stores numbers in Decimal128 format, which is not accepted by Algolia (only as string). This function:
# 1. Converts numbers to float from Decimal128
# 2. Introduces a maximum value for these numbers, as some values in MongoDB are outliers and incorrectly filled out and it gives range filters an unreal max value.
def validate_number(num, maxValue):
    if num is None:
        return num
    else:
        val = float(str(num))
        if val > maxValue:
            return maxValue
        return val

def prepare_algolia_object(mongo_object):
    # Create an instance of the Algolia object to index, and set its objectID based on the _id of the mongo object
    r = {}
    r["objectID"] = mongo_object["_id"]
    # prepare the string attributes
    for string_property in [
        ["name", True],
        ["space", True],
        ["description", True],
        ["neighborhood_overview", True],
        ["transit", True],
        ["address", False],
        ["property_type", False],
    ]:
        if string_property[0] in mongo_object:
            r[string_property[0]] = strip_long_text(
                mongo_object[string_property[0]], string_property[1]
            )

    # prepare the integer properties
    for num_property in [
        ["accommodates", 100],
        ["bedrooms", 20],
        ["beds", 100],
        ["number_of_reviews", 1000000],
        ["bathrooms", 100],
        ["price", 1000],
        ["weekly_price", 1000],
        ["security_deposit", 1000],
        ["cleaning_fee", 1000],
    ]:
        if num_property[0] in mongo_object:
            r[num_property[0]] = validate_number(
                mongo_object[num_property[0]], num_property[1]
            )

    # prepare the Sortable attributes (except for scores rating)

    # set rating if any
    if (
        "review_scores" in mongo_object
        and "review_scores_rating" in mongo_object["review_scores"]
    ):
        stars = round(mongo_object["review_scores"]["review_scores_rating"] / 20, 0)
        r["scores"] = {
            "stars": stars,
            "has_one": stars >= 1,
            "has_two": stars >= 2,
            "has_three": stars >= 3,
            "has_four": stars >= 4,
            "has_five": stars >= 5,
        }
    # set images
    if "images" in mongo_object:
        r["images"] = mongo_object["images"]
    # set GeoLocation if any
    if "address" in mongo_object:
        if "location" in mongo_object["address"]:
            if mongo_object["address"]["location"]["type"] == "Point":
                r["_geoloc"] = {
                    "lng": mongo_object["address"]["location"]["coordinates"][0],
                    "lat": mongo_object["address"]["location"]["coordinates"][1],
                }
    return r

Step 4. Define our index properties

Now let’s tell Algolia what to do with the properties we’ve given it. We’ll set [attributesToRetrieve](<https://www.algolia.com/doc/api-reference/api-parameters/attributesToRetrieve/>), the attributes that Algolia will return per search result for display in our UI, to an array of these properties: summary, description, space, neighborhood, transit, address, number_of_reviews, scores, price, cleaning_fee, property_type, accommodates, bedrooms, beds, bathrooms, security_deposit, images/picture_url, _geoloc. Our [attributesForFaceting](<https://www.algolia.com/doc/api-reference/api-parameters/attributesForFaceting/>) array will contain property_type, address/country, scores/stars, price, and cleaning_fee.

We’ll also set [searchableAttributes](<https://www.algolia.com/doc/api-reference/api-parameters/searchableAttributes/>), the attributes that are considered when a query is calculated. Algolia won’t waste time looking outside of this list for potential search matches, so it speeds up the query, and it lets us set the priority order from highest to lowest:

  1. (top priority attributes) name, address/street, address/suburb
  2. address/market, address/country
  3. description (this will be an unordered attribute)
  4. space (another unordered attribute)
  5. neighborhood_overview (another unordered attribute)
  6. (least priority) transit

We will also update the default ranking logic for our index:

  1. (top priority) geo – providing search results close-by is the top priority for us
  2. typo
  3. words
  4. filters
  5. proximity
  6. attribute
  7. exact
  8. (least priority) custom

We’re also updating our index to ignore plurals (which you might not think about much, but your users definitely will when it works as they don’t expect it to). You can find other great resources and settings on the Official Algolia API Reference page. Here’s what our code for this looks like:

algolia_index.set_settings(
    {
        "searchableAttributes": [
            "name,address.street,address.suburb",
            "address.market,address.country",
            "unordered(description)",
            "unordered(space)",
            "unordered(neighborhood_overview)",
            "transit",
        ],
        "attributesForFaceting": [
            "property_type",
            "searchable(address.country)",
            "scores.stars",
            "price",
            "cleaning_fee",
        ],
        "attributesToRetrieve": [
            "images.picture_url",
            "summary",
            "description",
            "space",
            "neighborhood",
            "transit",
            "address",
            "number_of_reviews",
            "scores",
            "price",
            "cleaning_fee",
            "property_type",
            "accommodates",
            "bedrooms",
            "beds",
            "bathrooms",
            "security_deposit",
            "_geoloc",
        ],
        "ranking": [
            "geo",
            "typo",
            "words",
            "filters",
            "proximity",
            "attribute",
            "exact",
            "custom",
        ],
        "ignorePlurals": True,
    }
)

Step 5. Load the dataset into Algolia from MongoDB

This short piece of code loads the dataset into the Algolia index, replacing the existing index so there are no out-of-date records.

# Prepare the Algolia objects
algolia_objects = list(map(prepare_algolia_object, initial_items))
algolia_index.replace_all_objects(algolia_objects, {"safe": True}).wait()

Script evaluation & performance

Overall, I found that loading an Algolia index from Python is quite a straightforward task, even though my Python skills are a little rusty. Most of my time actually went into preparing the AirBnB listing objects and transforming them into what I wanted inside Algolia. This would have probably been much simpler if I was working with our own datasets, as there wouldn’t have been as much transformation needed.

I learned that Algolia exposes a wonderful Python API — it’s simpler to use than I expected and contains great documentation that guided me through the entire process, step-by-step. The code required to prepare and load the index is minimal, and it felt intuitive to me. It also performed great when loading the index: it only needed just under 5 seconds to load and replace the entire index with 5000 records, even when run from a resource-limited, cloud-hosted server. When I ran it on some of our high-speed servers with fast Internet connection, it only took about 2 seconds. Our production dataset is much larger (about 40k records), but our standard pipelines that prepare the listings data are running for over an hour every day, so I am sure that our overall performance will not be affected by Algolia. So far, its simplicity and speed has far outweighed any drawbacks.

In the first article of this series, I talked about our use-case, architecture and the search challenges we are facing.

In the second article of this series, I covered the design specification of the PoC and talked about the implementation possibilities.

In the fourth article of this series, I will implement a sample frontend so we can evaluate the product from the user’s perspective and give the developers a head-start if they choose to go with this option.

About the author
Soma Osvay

Full Stack Engineer, Starschema

Recommended Articles

Powered byAlgolia Algolia Recommend

Part 2: Supercharging search for ecommerce solutions with Algolia and MongoDB — Proposed solution and design
engineering

Soma Osvay

Full Stack Engineer, Starschema

Part 1: Supercharging search for ecommerce solutions with Algolia and MongoDB — Use-case, architecture, and current challenges
engineering

Soma Osvay

Full Stack Engineer, Starschema

Building a COVID-19 geosearch index using CSV files, MongoDB, or GraphQL
engineering

Chuck Meyer

Sr. Developer Relations Engineer