Search by Algolia
Vector vs Keyword Search: Why You Should Care
ai

Vector vs Keyword Search: Why You Should Care

Search has been around for a while, to the point that it is now considered a standard requirement in many ...

Nicolas Fiorini

Senior Machine Learning Engineer

What is AI-powered site search?
ai

What is AI-powered site search?

With the advent of artificial intelligence (AI) technologies enabling services such as Alexa, Google search, and self-driving cars, the ...

John Stewart

VP Corporate Marketing

What is a B2B marketplace?
e-commerce

What is a B2B marketplace?

It’s no secret that B2B (business-to-business) transactions have largely migrated online. According to Gartner, by 2025, 80 ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

3 strategies for B2B ecommerce growth: key takeaways from B2B Online - Chicago
e-commerce

3 strategies for B2B ecommerce growth: key takeaways from B2B Online - Chicago

Twice a year, B2B Online brings together industry leaders to discuss the trends affecting the B2B ecommerce industry. At the ...

Elena Moravec

Director of Product Marketing & Strategy

Deconstructing smart digital merchandising
e-commerce

Deconstructing smart digital merchandising

This is Part 2 of a series that dives into the transformational journey made by digital merchandising to drive positive ...

Benoit Reulier
Reshma Iyer

Benoit Reulier &

Reshma Iyer

The death of traditional shopping: How AI-powered conversational commerce changes everything
ai

The death of traditional shopping: How AI-powered conversational commerce changes everything

Get ready for the ride: online shopping is about to be completely upended by AI. Over the past few years ...

Aayush Iyer

Director, User Experience & UI Platform

What is B2C ecommerce? Models, examples, and definitions
e-commerce

What is B2C ecommerce? Models, examples, and definitions

Remember life before online shopping? When you had to actually leave the house for a brick-and-mortar store to ...

Catherine Dee

Search and Discovery writer

What are marketplace platforms and software? Why are they important?
e-commerce

What are marketplace platforms and software? Why are they important?

If you imagine pushing a virtual shopping cart down the aisles of an online store, or browsing items in an ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is an online marketplace?
e-commerce

What is an online marketplace?

Remember the world before the convenience of online commerce? Before the pandemic, before the proliferation of ecommerce sites, when the ...

Catherine Dee

Search and Discovery writer

10 ways AI is transforming ecommerce
e-commerce

10 ways AI is transforming ecommerce

Artificial intelligence (AI) is no longer just the stuff of scary futuristic movies; it’s recently burst into the headlines ...

Catherine Dee

Search and Discovery writer

AI as a Service (AIaaS) in the era of "buy not build"
ai

AI as a Service (AIaaS) in the era of "buy not build"

Imagine you are the CTO of a company that has just undergone a massive decade long digital transformation. You’ve ...

Sean Mullaney

CTO @Algolia

By the numbers: the ROI of keyword and AI site search for digital commerce
product

By the numbers: the ROI of keyword and AI site search for digital commerce

Did you know that the tiny search bar at the top of many ecommerce sites can offer an outsized return ...

Jon Silvers

Director, Digital Marketing

Using pre-trained AI algorithms to solve the cold start problem
ai

Using pre-trained AI algorithms to solve the cold start problem

Artificial intelligence (AI) has quickly moved from hot topic to everyday life. Now, ecommerce businesses are beginning to clearly see ...

Etienne Martin

VP of Product

Introducing Algolia NeuralSearch
product

Introducing Algolia NeuralSearch

We couldn’t be more excited to announce the availability of our breakthrough product, Algolia NeuralSearch. The world has stepped ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

AI is eating ecommerce
ai

AI is eating ecommerce

The ecommerce industry has experienced steady and reliable growth over the last 20 years (albeit interrupted briefly by a global ...

Sean Mullaney

CTO @Algolia

Semantic textual similarity: a game changer for search results and recommendations
product

Semantic textual similarity: a game changer for search results and recommendations

As an ecommerce professional, you know the importance of providing a five-star search experience on your site or in ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is hashing and how does it improve website and app search?
ai

What is hashing and how does it improve website and app search?

Hashing.   Yep, you read that right.   Not hashtags. Not golden, crisp-on-the-outside, melty-on-the-inside hash browns ...

Catherine Dee

Search and Discovery writer

Conference Recap: ECIR23 Take-aways
engineering

Conference Recap: ECIR23 Take-aways

We’re just back from ECIR23, the leading European conference around Information Retrieval systems, which ran its 45th edition in ...

Paul-Louis Nech

Senior ML Engineer

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Here at Algolia, we’re a bunch of hobbyists at heart. We’re not just building for the Fortune 500, we’re also building for the tinkerers. That’s why we’ve put so much effort into tools like DocSearch — we love anybody with the giving-back, open source software mindset behind tools like Astro, Home Assistant, and SASS just as much as React, Twilio, and Discord.

We’ve been cooking up an idea specifically about giving back to the discovery-driven devs that have been experimenting with Algolia since its inception: open indexes. There already exist many open datasets on the Internet, but they’re not exactly known for their quality. Even the high-quality ones aren’t always in the form we need, and there’s little incentive for the makers to give us a hand because they usually aren’t getting paid to maintain their datasets. This feels like the perfect problem for us to tackle here at Algolia, and so a few of us here have been working to design our first open index. Let’s walk through that process in this article; it’ll be fairly conceptual, so it’ll still be useful as a guide for anybody trudging through the occasionally-complex index creation process themselves.

Before we dive in though, we need to talk a bit about composability.

Composability — a big word for small things

Composability is the subjective metric of how useful the pieces of a system are, independently and in different combinations.

In the case of a song, for example, the producer has to think about not only whether the singer had a good take, or whether the guitar was in tune, or whether the drummer in on-beat, but how well those tracks all go together. At different points in the song, different combinations of those tracks are going to be playing at the same time, so they need to work both independently and within a smattering of different contexts. The same goes for pieces of our index.

When you’re creating one for your company, it’s not likely that you’ll have only one application consuming all of the data for a given record all at once — large-scale applications usually request dozens of different combinations of data for different pages and views. But for our index, that effect is incredibly more prominent: we have no idea which bits of data are going to be used by whom, so we need to make sure all of those bits are high-quality and independent from each other, but also easily combinable into high-quality super-structures, just like the music producer wants each recording to work independently while also fitting neatly into the track as a whole, in any configuration.

Let’s try to model out a recipe, specifically this delicious-looking Greek Lemon Chicken and Potatoes recipe from Chef John on Allrecipes. Let’s see if we can start with Allrecipes’ model for this recipe and then improve it to be more specific and composable. As it stands, it seems like the recipe model has this structure:

  • title (string)
  • description (string)
  • times (array of objects)
    • name (string, of a couple predefined options)
    • number (number)
    • unit (string, of a couple predefined options)
  • ingredients (array of objects)
    • amount (number)
    • unit (string, of a couple predefined options)
    • ingredient_name (string)
    • is_header (boolean)
  • directions (array of objects)
    • step_content (string)
    • is_header (boolean)
  • notes (array of objects)
    • title (string)
    • note (string)
  • servings (number)
  • yield (string)
  • public (boolean)
  • media (array of strings, validated as URLs)
  • author (string)

This is already fairly comprehensive! They’re able to do a lot with it, like calculating the total time a recipe will take to complete, the nutrition information, and what the ingredient amounts would be for a different amount of servings. We might make a few suggestions, though:

  1. Allrecipes uses an ingredient line to create a sort of header between ingredients in the list. Our chosen recipe doesn’t use this feature (it doesn’t look like many do), but you could conceive of a situation where you’re instructing the reader about how to make two separate items first before combining them (like the crust and filling of a pie or cheesecake). It’s definitely a useful feature that more recipe writers should take advantage of, but it feels a little strange to not formalize it in the structure of the recipe, choosing instead to use an ingredient line as a “header”, just making the break between the ingredient groups visually and not fundamentally. I might suggest creating a high-level array called ingredient_groups, which contains at least one object with a title (that’s the header text) and an array called ingredients. That header would only be shown if there are multiple groups that need to be delineated, and that internal array would house all of the ingredients for specifically the section under that header. Then we could do away with that pesky is_header boolean they’d otherwise need to render the section titles differently from the rest of the ingredients. This same change would apply to the directions, which similarly use header lines to mark off sections.
  2. Speaking of ingredients, Allrecipes doesn’t actually ask for submitters to split up their ingredients into amount, unit, and item name — they just ask you to input a simple string like this:Recipes ingredients input form exampleI suspect that they’re storing it internally in three pieces because they do math with the number and unit (you can have it convert a 4 serving recipe into 3, which often requires changing units). So this is already somewhat composable, but we can do a little bit better. If we ask for all three of those components individually, we can do some interesting things with it. For example, we could have a list of acceptable ingredients (which users could add to, if their ingredient didn’t already exist in our database), and let users choose the ingredient instead of typing it as a string.In a production environment, we’d have those ingredients in our database, referenced here only by a UUID. Each ingredient would be stored with several form options inside their own database record — for example, I’d call “butter” a single ingredient, with “chilled”, “softened”, “melted”, “sliced”, and “cubed” as optional forms of the one “butter” ingredient. The ingredient could also have its preferred category of units (flour should always be measured by weight or by dry volume, but never by length), which could optionally be custom (like “sticks” of butter) or even empty (you don’t need a special unit for “potatoes”). Then the ingredient form in this particular recipe would be stored in the ingredient object in our recipes index, alongside a unit string matching the available units for this particular ingredient and a validated numerical amount of the ingredient.We might fully release this system in the future (if you’re at a database or CMS company, you know where to find us) but for now, we’re just going to stick that information in a JSON object representing all of the ingredients used in the recipes in the index. It’ll be a bit more manual work for us for now, but everybody consuming the index won’t have to worry about spinning up their own database instance.

Here’s our new recipes index shape:

  • title (string)
  • description (string)
  • times (array of objects)
    • name (string, of a couple predefined options)
    • number (number)
    • unit (string, of a couple predefined options)
  • ingredient_groups (array of objects)
    • title (string)
    • ingredients (array of objects)
      • ingredient (UUID, matching an ingredient in the ingredients JSON)
      • forms (array of strings, matching the forms of the ingredient)
      • amount (number)
      • unit (string, matching one of the unit_categories for this ingredient)
  • direction_groups (array of objects)
    • title (string)
    • directions (array of strings)
  • notes (array of objects)
    • title (string)
    • note (string)
  • servings (number)
  • yield (string)
  • public (boolean)
  • media (array of strings, validated as URLs)
  • author (string)

And then our new ingredients JSON, where each record matches this shape:

  • uuid (UUID)
  • name (string)
  • plural_name (string)
  • unit_categories (array of strings)
  • custom_unit (array of strings, not present unless unit_category contains “custom”)
  • forms (array of strings)

The unit list is a reasonable enough object to just keep in memory as JSON as well, so here’s a little utility JavaScript program to keep track of these units and convert between them easily, made by my coworker Jaden.

Here’s the recipes JSON we’re going to upload to Algolia. For the best results, you’d normally want to store only the things you’ll be searching through in Algolia and then the rest in a quick database like Fauna, yet because this is a test dataset, we’re making everything searchable, which is why the JSON is so big for one recipe. If you’d like to add to this dataset with your personal recipes, we’d love to include them! Just fill out the Google Form here for every recipe, and we’ll filter through and add the best ones manually.

And the part you’ve been waiting for: since this index is open and growing, it’ll be very useful for you to experiment with Algolia and test your own integrations before loading in production-ready data! If you want to give it a try, here are the credentials:

Read-only public key: ea2f27cfed9ddeed93f7532424a64480

Application ID: OKF83BFQS4

Index name: recipes

We’re excited to see what you fill this dataset with (again, the Google Form is available here)! We’ll be improving it on our end with recipes from the hundreds of Algolia devs here on our team. Make sure to give us a shout too on Twitter if you make something fun with the dataset itself!

About the author
Natwar Maheshwari

Developer Marketing Lead

Recommended Articles

Powered byAlgolia Algolia Recommend

Algolia's top 10 tips to achieve highly relevant search results
product

Julien Lemoine

Co-founder & former CTO at Algolia

Inside the Algolia Engine Part 2 — The Indexing Challenge of Instant Search
engineering

Julien Lemoine

Co-founder & former CTO at Algolia

Introducing our new navigation
product

Craig Williams

Director of Product Design & Research