In this sprint we will design our data pipeline, this will include what data we will sync to Algolia and how often we update it. By the end of this sprint we will have all of our data live in Algolia. Once you have completed the tasks below you are able to move onto the next sprint.
Depending on the size of your company, some of these roles may be the same person. This sprint it is important we identify these roles and get in contact with them.
Planning and project oversight
Analyze and design IT components
Build application business logic, server scripts and API’s
First we will review our mock-ups created in the previous sprint and identify all of the data types that we have included. For instance, have we included articles, products, FAQ? Any data type we want to be able to search we must create an index for.
It’s important at this phase to identify any business data that can be included on the records. This could be any metrics that are important to how you want the results to be ranked. Examples include: number of clicks, margin, date released, distance from user etc. It is possible that these metrics might be handled in a different system.
Now we know all of the data types we want to sync into Algolia we need to think about how we want this to be structured and how often it needs to be updated.
All of the best practices are covered in this webinar, including how to handle different ranking strategies such as ‘sort bys’. It is likely the data you have will require some transformation. There are some use cases, such as handling multiple languages that will also require specific indexing strategies.
At this point it makes sense to create a system diagram with a view of what will pass between systems and how often.
The tools we use to build our pipeline will depend on the systems we are trying to pull data out of. For each type of data, identify the system you will need to access and check the relevant section below.
Out of the box connectors
The first step of setting a Shopify integration is setting up a full reindex. Once you have validated your Algolia account you can trigger this straightaway. This will create 3 indices; products, pages and collections. If you want to enrich this data further with data from an API or 3rd party system you can utilise metafields, if you want to enrich with data directly managed in Shopify, use named tags. If you have the option, we recommend named tags as metafields can slow the indexing process.
If you are unable to utilise one of our out of the box integrations, you can check out connector options that are built by third party and community that suit your needs.
Do it yourself
If the system with the required data has an integration point where we can utilise one of the API clients listed above this is an optimum point to index to Algolia.
If we can only access the entire database we can utilise replaceAllObjects.
You are a good fit for the Algolia crawler if you have static, html content you want to index, for instance this is a great way to index data for a site search implementation. However, this is best implemented if you are able to enrich it with ranking data from Google Analytics or Adobe Analytics.
All configuration is managed in the crawler editor as a json file. Once you have set up your startUrls and sitemaps you can run the crawler and use the path explorer and data analysis to figure out which urls have been crawled and haven’t. Then you can change the configuration to ensure all required urls are crawled.