AI-powered search: From keywords to conversations
By now, everyone’s had the opportunity to experiment with AI tools like ChatGPT or Midjourney and ponder their inner ...
Director, Product Marketing
By now, everyone’s had the opportunity to experiment with AI tools like ChatGPT or Midjourney and ponder their inner ...
Director, Product Marketing
Search has been around for a while, to the point that it is now considered a standard requirement in many ...
Senior Machine Learning Engineer
With the advent of artificial intelligence (AI) technologies enabling services such as Alexa, Google search, and self-driving cars, the ...
VP Corporate Marketing
It’s no secret that B2B (business-to-business) transactions have largely migrated online. According to Gartner, by 2025, 80 ...
Sr. SEO Web Digital Marketing Manager
Twice a year, B2B Online brings together industry leaders to discuss the trends affecting the B2B ecommerce industry. At the ...
Director of Product Marketing & Strategy
This is Part 2 of a series that dives into the transformational journey made by digital merchandising to drive positive ...
Benoit Reulier &
Reshma Iyer
Get ready for the ride: online shopping is about to be completely upended by AI. Over the past few years ...
Director, User Experience & UI Platform
Remember life before online shopping? When you had to actually leave the house for a brick-and-mortar store to ...
Search and Discovery writer
If you imagine pushing a virtual shopping cart down the aisles of an online store, or browsing items in an ...
Sr. SEO Web Digital Marketing Manager
Remember the world before the convenience of online commerce? Before the pandemic, before the proliferation of ecommerce sites, when the ...
Search and Discovery writer
Artificial intelligence (AI) is no longer just the stuff of scary futuristic movies; it’s recently burst into the headlines ...
Search and Discovery writer
Imagine you are the CTO of a company that has just undergone a massive decade long digital transformation. You’ve ...
CTO @Algolia
Did you know that the tiny search bar at the top of many ecommerce sites can offer an outsized return ...
Director, Digital Marketing
Artificial intelligence (AI) has quickly moved from hot topic to everyday life. Now, ecommerce businesses are beginning to clearly see ...
VP of Product
We couldn’t be more excited to announce the availability of our breakthrough product, Algolia NeuralSearch. The world has stepped ...
Chief Executive Officer and Board Member at Algolia
The ecommerce industry has experienced steady and reliable growth over the last 20 years (albeit interrupted briefly by a global ...
CTO @Algolia
As an ecommerce professional, you know the importance of providing a five-star search experience on your site or in ...
Sr. SEO Web Digital Marketing Manager
Hashing. Yep, you read that right. Not hashtags. Not golden, crisp-on-the-outside, melty-on-the-inside hash browns ...
Search and Discovery writer
Dec 17th 2021 engineering
Every search interface relies on a fast back-end data-indexing process that keeps its search results up to date in as timely a manner as possible. But search indexing is only one side of the coin. The other side is the real-time speed of a high-quality relevant search engine.
For all search engines, the search request is the highest priority, with indexing a (very) close second. There are several reasons for this, but the most important is a business argument: every search is a potential game changer, a path to a conversion. Any slow or dropped search request, or irrelevant result, is a potential financial or business loss.
To achieve maximum speed & relevance, a search engine must:
As a result, it takes a little extra time to update an index. But if you learn to follow a few indexing best practices, you’ll even things out.
“All well and good,” say the full stack and back-end developers. “I understand the priority of search. But I want to know more about my data. How do I get my data onto your servers? Can it handle my use cases? Does it accept any kind of data? Is it simple, secure, fast?”
In a recent article on indexing, we explored a variety of advanced use cases, and focused on two search indexing essentials: fast updates and wide applicability. Now it’s time to dig into the code and explain some speed-enhancing algorithms and indexing best practices that ensure you get the highest indexing speed for any search use case.
There are two primary areas to focus on here:
To understand indexing on its own terms, we need to decouple it from search and outline the most popular indexing scenarios:
A well-structured index provides the foundation for a fast and fully-featured customer-facing search interface, with great relevance. In fact, indexing is so important to search & relevance that it needs to be designed and implemented with as much care and dedication as the front end.
Multiple indexes can form a single touch point for all back-office data. When put together in a certain way, your indexes can create a company-wide searchable data layer that lies between your back-office and all front ends used internally (employees) or externally (customers, partners).
The “matchmaker” scenario is when Company X builds an Algolia index and makes it available to external data providers. In this scenario, Company X builds a collaborative website, such as a marketplace or streaming platform, where it displays the products/media of multiple vendors, partners, and contributors. To accomplish this, Company X exposes its Algolia index to these external data providers, allowing them to send data once they understand the format.
Here’s the main difference between the first two scenarios:
The wide applicability of our indexing wouldn’t be possible nor survive the competitive digital business environment if it were not performant in all situations. While we offer high indexing speed out of the box, this hinges on implementing best indexing practices. That’s what this article is about.
Just a word about what we mean by “out-of-the-box high performance”. Our indexing comes with the following technologies:
The most important indexing practice is to run a batching algorithm that updates multiple records in one indexing operation, in a regular and timely manner. This is true for all use cases.
Why do we recommend batching? Because there’s a small performance cost to every indexing request. An index request involves a small “reindexing” of your entire index, which could take up to 1 second, or more if the index is very large. Thus, sending 100s of indexing requests, one record at a time, can create an indexing queue that will slow down the entire indexing process. To mitigate this, it’s important to limit the number of indexing requests made on the server by sending less requests.
Taking all that into account, here are the 3 most important indexing best practices (pretty standard fare for data updates):
One common mistake is to send one record at a time. If your back-end data constantly changes, it would be wrong to send each change as it occurs. As stated above, bottlenecks occur when you create a queue of 100’s of indexing requests that are waiting to be processed.
Instead, as a best practice, use batch indexing. You send each change to a temporary cache, and then regularly send that cache to Algolia, for example, every 5 minutes or 30 minutes for larger indexes. Never batch faster than 1 minute, because you’ll end up creating a bottleneck.
This code example builds a new index. It batch-saves 10000 record-chunks using the save_objects
method of Algolia’s Python API.
#python import json from algoliasearch import algoliasearch client = algoliasearch.Client('YourApplicationID', 'YourAdminAPIKey') algolia_index = client.init_index('bubble_gum') with open('bubble_gum.json') as f: records = json.load(f) chunk_size = 10000 for i in range(0, len(records), chunk_size): algolia_index.save_objects(records[i:i + chunk_size])
See how our API has automated the batching process.
Improving upon the previous suggestion, you don’t want to send too many records in a single batch. To reduce indexing request sizes, you should perform incremental updates, where you update only the new records.
This code adds a new Bubble Gum series.
#python algolia_index.save_objects([ {"objectID": "myID1", "item": "Classic Bubble Gum", "price": "3.99"}, {"objectID": "myID2", "item": "Raspberry Bubble Gum", "price": "3.99"}, {"objectID": "myID3", "item": "Cherry Bubble Gum", "price": "3.99"}, {"objectID": "myID4", "item": "Blueberry Bubble Gum", "price": "3.99"}, {"objectID": "myID5", "item": "Mulberry Bubble Gum", "price": "3.99"}, {"objectID": "myID6", "item": "Lemon Bubble Gum", "price": "3.99"} ])
Note: It’s a good idea to do a full reindex of all records every night or at least weekly.
Check out our complete incremental updating solution.
To lower the indexing traffic even more, you’ll want to send only the attributes that have changed, not the whole record. For this, you’ll use a partial indexing
strategy.
This code changes only the price of some of the Bubble Gums, no other attribute.
#python algolia_index.save_objects([ {'objectID': 'myID1', 'price': 4.99}, {'objectID': 'myID3', 'price': 4.99}, {'objectID': 'myID6', 'price': 2.99} ])
Check out our complete partial-indexing solution.
Our first article on indexing presented a high-level overview of standard and advanced indexing use cases. This article walked you through indexing best practices and the implementation details of a standard indexing process. Our next article discusses how to optimize indexing in advanced use cases.
Our remaining articles will provide front & back end code for some of the advanced indexing use cases we discussed, starting with real-time pricing.
To get started with indexing, you can upload your data for free, or get a customized demo from our search experts today.
Powered by Algolia Recommend