Data design

In this sprint we will design our data pipeline, this will include what data we will sync to Algolia and how often we update it. By the end of this sprint we will have all of our data live in Algolia. Once you have completed the tasks below you are able to move onto the next sprint.

Data Pipeline & Indexing image
Team members

Depending on the size of your company, some of these roles may be the same person. This sprint it is important we identify these roles and get in contact with them.

Project Manager

Planning and project oversight

Systems Architects

Analyze and design IT components

Back End Engineers

Build application business logic, server scripts and API’s

First we will review our mock-ups created in the previous sprint and identify all of the data types that we have included. For instance, have we included articles, products, FAQ? Any data type we want to be able to search we must create an index for.

Within the types of data that we want to upload we have 4 types of attributes that should be uploaded: searchable data, filterable data, display data and business data.

It’s important at this phase to identify any business data that can be included on the records. This could be any metrics that are important to how you want the results to be ranked. Examples include: number of clicks, margin, date released, distance from user etc. It is possible that these metrics might be handled in a different system.

Now we know all of the data types we want to sync into Algolia we need to think about how we want this to be structured and how often it needs to be updated.

All of the best practices are covered in this webinar, including how to handle different ranking strategies such as ‘sort bys’. It is likely the data you have will require some transformation. There are some use cases, such as handling multiple languages that will also require specific indexing strategies.

At this point it makes sense to create a system diagram with a view of what will pass between systems and how often.

The tools we use to build our pipeline will depend on the systems we are trying to pull data out of. For each type of data, identify the system you will need to access and check the relevant section below.

Out of the box connectors

Algolia supported:

Shopify logo


The first step of setting a Shopify integration is setting up a full reindex. Once you have validated your Algolia account you can trigger this straightaway. This will create 3 indices; products, pages and collections. If you want to enrich this data further with data from an API or 3rd party system you can utilise metafields, if you want to enrich with data directly managed in Shopify, use named tags. If you have the option, we recommend named tags as metafields can slow the indexing process.

Adobe Commerce (Magento)

Adobe Commerce (Magento)

The first step of setting an Adobe Commerce (Magento) integration is to install the extension, add your credentials, enable indexing, and push your initial data to Algolia. If you will need to transform data then install the CustomAlgolia extension at the same time.

Salesforce Commerce Cloud

Salesforce Commerce Cloud

The first step of setting the SFCC integration is to download, install, and set up the Algolia cartridge. Please note that you made need to customize the indexing scripts within the cartridge to get specific, non-default data indexed to Algolia.  

Community supported:

If you are unable to utilise one of our out of the box integrations, you can check out connector options that are built by third party and community that suit your needs. 

Do it yourself

If you are unable to utilise one of our out of the box connectors, we will need to utilise the API clients to ensure that we can sync the desired data into Algolia. Our API clients contain all the indexing methods you will need are available in PHP, Ruby, Javascript, Python, Swift, Kotlin, Android, .NET, Java, Go and Scala. 

If the system with the required data has an integration point where we can utilise one of the API clients listed above this is an optimum point to index to Algolia. 

If we can access the changes to the data (deltas) we can utilise addObjects, partialUpdateObjects and deleteObjects.

If we can only access the entire database we can utilise replaceAllObjects.

Algolia crawler

You are a good fit for the Algolia crawler if you have static, html content you want to index, for instance this is a great way to index data for a site search implementation. However, this is best implemented if you are able to enrich it with ranking data from Google Analytics or Adobe Analytics.

All configuration is managed in the crawler editor as a json file. Once you have set up your startUrls and sitemaps you can run the crawler and use the path explorer and data analysis to figure out which urls have been crawled and haven’t. Then you can change the configuration to ensure all required urls are crawled.

Once all required URLs are being crawled you can configure how these records are extracted in a javascript function within the configuration.

Now we have our data into Algolia the next step is to put in some initial configurations so that we can test our relevance within the dashboard. The three key areas to setup are the searchable attributes, attributes for facetting, and attributes to retrieve.