Every search interface relies on a fast back-end data-indexing process that keeps its search results up to date in as timely a manner as possible. But search indexing is only one side of the coin. The other side is the real-time speed of a high-quality relevant search engine.
For all search engines, the search request is the highest priority, with indexing a (very) close second. There are several reasons for this, but the most important is a business argument: every search is a potential game changer, a path to a conversion. Any slow or dropped search request, or irrelevant result, is a potential financial or business loss.
To achieve maximum speed & relevance, a search engine must:
As a result, it takes a little extra time to update an index. But if you learn to follow a few indexing best practices, you’ll even things out.
“All well and good,” say the full stack and back-end developers. “I understand the priority of search. But I want to know more about my data. How do I get my data onto your servers? Can it handle my use cases? Does it accept any kind of data? Is it simple, secure, fast?”
In a recent article on indexing, we explored a variety of advanced use cases, and focused on two search indexing essentials: fast updates and wide applicability. Now it’s time to dig into the code and explain some speed-enhancing algorithms and indexing best practices that ensure you get the highest indexing speed for any search use case.
There are two primary areas to focus on here:
To understand indexing on its own terms, we need to decouple it from search and outline the most popular indexing scenarios:
A well-structured index provides the foundation for a fast and fully-featured customer-facing search interface, with great relevance. In fact, indexing is so important to search & relevance that it needs to be designed and implemented with as much care and dedication as the front end.
Multiple indexes can form a single touch point for all back-office data. When put together in a certain way, your indexes can create a company-wide searchable data layer that lies between your back-office and all front ends used internally (employees) or externally (customers, partners).
The “matchmaker” scenario is when Company X builds an Algolia index and makes it available to external data providers. In this scenario, Company X builds a collaborative website, such as a marketplace or streaming platform, where it displays the products/media of multiple vendors, partners, and contributors. To accomplish this, Company X exposes its Algolia index to these external data providers, allowing them to send data once they understand the format.
Here’s the main difference between the first two scenarios:
The wide applicability of our indexing wouldn’t be possible nor survive the competitive digital business environment if it were not performant in all situations. While we offer high indexing speed out of the box, this hinges on implementing best indexing practices. That’s what this article is about.
Just a word about what we mean by “out-of-the-box high performance”. Our indexing comes with the following technologies:
The most important indexing practice is to run a batching algorithm that updates multiple records in one indexing operation, in a regular and timely manner. This is true for all use cases.
Why do we recommend batching? Because there’s a small performance cost to every indexing request. An index request involves a small “reindexing” of your entire index, which could take up to 1 second, or more if the index is very large. Thus, sending 100s of indexing requests, one record at a time, can create an indexing queue that will slow down the entire indexing process. To mitigate this, it’s important to limit the number of indexing requests made on the server by sending less requests.
Taking all that into account, here are the 3 most important indexing best practices (pretty standard fare for data updates):
One common mistake is to send one record at a time. If your back-end data constantly changes, it would be wrong to send each change as it occurs. As stated above, bottlenecks occur when you create a queue of 100’s of indexing requests that are waiting to be processed.
Instead, as a best practice, use batch indexing. You send each change to a temporary cache, and then regularly send that cache to Algolia, for example, every 5 minutes or 30 minutes for larger indexes. Never batch faster than 1 minute, because you’ll end up creating a bottleneck.
This code example builds a new index. It batch-saves 10000 record-chunks using the save_objects
method of Algolia’s Python API.
#python import json from algoliasearch import algoliasearch client = algoliasearch.Client('YourApplicationID', 'YourAdminAPIKey') algolia_index = client.init_index('bubble_gum') with open('bubble_gum.json') as f: records = json.load(f) chunk_size = 10000 for i in range(0, len(records), chunk_size): algolia_index.save_objects(records[i:i + chunk_size])
See how our API has automated the batching process.
Improving upon the previous suggestion, you don’t want to send too many records in a single batch. To reduce indexing request sizes, you should perform incremental updates, where you update only the new records.
This code adds a new Bubble Gum series.
#python algolia_index.save_objects([ {"objectID": "myID1", "item": "Classic Bubble Gum", "price": "3.99"}, {"objectID": "myID2", "item": "Raspberry Bubble Gum", "price": "3.99"}, {"objectID": "myID3", "item": "Cherry Bubble Gum", "price": "3.99"}, {"objectID": "myID4", "item": "Blueberry Bubble Gum", "price": "3.99"}, {"objectID": "myID5", "item": "Mulberry Bubble Gum", "price": "3.99"}, {"objectID": "myID6", "item": "Lemon Bubble Gum", "price": "3.99"} ])
Note: It’s a good idea to do a full reindex of all records every night or at least weekly.
Check out our complete incremental updating solution.
To lower the indexing traffic even more, you’ll want to send only the attributes that have changed, not the whole record. For this, you’ll use a partial indexing
strategy.
This code changes only the price of some of the Bubble Gums, no other attribute.
#python algolia_index.save_objects([ {'objectID': 'myID1', 'price': 4.99}, {'objectID': 'myID3', 'price': 4.99}, {'objectID': 'myID6', 'price': 2.99} ])
Check out our complete partial-indexing solution.
Our first article on indexing presented a high-level overview of standard and advanced indexing use cases. This article walked you through indexing best practices and the implementation details of a standard indexing process. Our next article discusses how to optimize indexing in advanced use cases.
Our remaining articles will provide front & back end code for some of the advanced indexing use cases we discussed, starting with real-time pricing.
To get started with indexing, you can upload your data for free, or get a customized demo from our search experts today.
Peter Villani
Sr. Tech & Business WriterPowered by Algolia AI Recommendations
Peter Villani
Sr. Tech & Business WriterPeter Villani
Sr. Tech & Business WriterJulien Lemoine
Co-founder & former CTO at Algolia