Search by Algolia
Haystack EU 2023: Learnings and reflections from our team
ai

Haystack EU 2023: Learnings and reflections from our team

If you have built search experiences, you know creating a great search experience is a never ending process: the data ...

Paul-Louis Nech

Senior ML Engineer

What is k-means clustering? An introduction
product

What is k-means clustering? An introduction

Just as with a school kid who’s left unsupervised when their teacher steps outside to deal with a distraction ...

Catherine Dee

Search and Discovery writer

Feature Spotlight: Synonyms
product

Feature Spotlight: Synonyms

Back in May 2014, we added support for synonyms inside Algolia. We took our time to really nail the details ...

Jaden Baptista

Technical Writer

Feature Spotlight: Query Rules
product

Feature Spotlight: Query Rules

You’re running an ecommerce site for an electronics retailer, and you’re seeing in your analytics that users keep ...

Jaden Baptista

Technical Writer

An introduction to transformer models in neural networks and machine learning
ai

An introduction to transformer models in neural networks and machine learning

What do OpenAI and DeepMind have in common? Give up? These innovative organizations both utilize technology known as transformer models ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What’s the secret of online merchandise management? Giving store merchandisers the right tools
e-commerce

What’s the secret of online merchandise management? Giving store merchandisers the right tools

As a successful in-store boutique manager in 1994, you might have had your merchandisers adorn your street-facing storefront ...

Catherine Dee

Search and Discovery writer

New features and capabilities in Algolia InstantSearch
engineering

New features and capabilities in Algolia InstantSearch

At Algolia, our business is more than search and discovery, it’s the continuous improvement of site search. If you ...

Haroen Viaene

JavaScript Library Developer

Feature Spotlight: Analytics
product

Feature Spotlight: Analytics

Analytics brings math and data into the otherwise very subjective world of ecommerce. It helps companies quantify how well their ...

Jaden Baptista

Technical Writer

What is clustering?
ai

What is clustering?

Amid all the momentous developments in the generative AI data space, are you a data scientist struggling to make sense ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is a vector database?
product

What is a vector database?

Fashion ideas for guest aunt informal summer wedding Funny movie to get my bored high-schoolers off their addictive gaming ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Unlock the power of image-based recommendation with Algolia’s LookingSimilar
engineering

Unlock the power of image-based recommendation with Algolia’s LookingSimilar

Imagine you're visiting an online art gallery and a specific painting catches your eye. You'd like to find ...

Raed Chammam

Senior Software Engineer

Empowering Change: Algolia's Global Giving Days Impact Report
algolia

Empowering Change: Algolia's Global Giving Days Impact Report

At Algolia, our commitment to making a positive impact extends far beyond the digital landscape. We believe in the power ...

Amy Ciba

Senior Manager, People Success

Retail personalization: Give your ecommerce customers the tailored shopping experiences they expect and deserve
e-commerce

Retail personalization: Give your ecommerce customers the tailored shopping experiences they expect and deserve

In today’s post-pandemic-yet-still-super-competitive retail landscape, gaining, keeping, and converting ecommerce customers is no easy ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Algolia x eTail | A busy few days in Boston
algolia

Algolia x eTail | A busy few days in Boston

There are few atmospheres as unique as that of a conference exhibit hall: the air always filled with an indescribable ...

Marissa Wharton

Marketing Content Manager

What are vectors and how do they apply to machine learning?
ai

What are vectors and how do they apply to machine learning?

To consider the question of what vectors are, it helps to be a mathematician, or at least someone who’s ...

Catherine Dee

Search and Discovery writer

Why imports are important in JS
engineering

Why imports are important in JS

My first foray into programming was writing Python on a Raspberry Pi to flicker some LED lights — it wasn’t ...

Jaden Baptista

Technical Writer

What is ecommerce? The complete guide
e-commerce

What is ecommerce? The complete guide

How well do you know the world of modern ecommerce?  With retail ecommerce sales having exceeded $5.7 trillion worldwide ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Data is king: The role of data capture and integrity in embracing AI
ai

Data is king: The role of data capture and integrity in embracing AI

In a world of artificial intelligence (AI), data serves as the foundation for machine learning (ML) models to identify trends ...

Alexandra Anghel

Director of AI Engineering

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

What is a facet? What is faceted search? And what kind of data best represents facets?

Quick definition: A facet is an onscreen filter that allows end-users to narrow down their search, giving them more control over their search results’ relevance. A typical facet for an e-commerce product is an attribute like “brand” or “price”, and a facet’s values are the individual brands and prices. By clicking on a facet value, users can include and exclude whole categories of products. For example, by selecting “Apple” in the “Brand” category, a user can exclude every product except Apple products.

This is what we call a faceted search experience, where categories and common terms drive the search just as much as the text in a search bar. Every great online search experience offers a faceted search.

The data behind faceted search 

In this article, we  look at facets from the ground up, from the data to the UI. Specifically, we describe how to use JSON to create a great faceted search experience. A well-thought-out data strategy for faceting ensures effective filtering and high-speed performance. It also makes it easy for front-end developers to code, a critical consideration in a fast-paced coding environment.

Facets clean up your data 

Good search always starts with clean data. The best search index is well-structured, well-written, and includes nothing misleading or extraneous. An index is a searchable set of data, and indexing is the process that creates an index. These are the standard terms in search technology to describe how a search engine structures its data. Even though every engine structures its data differently, the end result is always an index. We use the words data and index, not dataset, database, or any other such term.  

Every search index includes facets because they organize data and ensure simplicity and completeness. Consider the following: the first example contains no facets; the next contains facets.

Without facets

Title: Star Wars: Episode V – The Empire Strikes Back

Description: Popular science fiction movie from the 70s, an American space opera, where Luke Skywalker, along with Han Solo, Princess Leia, and Chewbacca, fight Darth Vader and the Rebel Alliance to save the Galactic Empire. Created by George Lucas, the movie was produced by Lucasfilm and is now owned and distributed by Walt Disney Studios films. The ensemble cast includes Mark Hamill, Harrison Ford, Carrie Fisher, Billy Dee Williams, Anthony Daniels, David Prowse, Kenny Baker, Peter Mayhew, and Frank Oz. The film became the highest-grossing film of 1980 with $440 million.

With facets

Title: Star Wars: Episode V – The Empire Strikes Back

Description: An American space opera. Luke Skywalker, along with Han Solo, Princess Leia, and Chewbacca, fight Darth Vader and the Rebel Alliance to save the Galactic Empire.

Genre: science fiction, space opera

Era: 70s, 80s

Story: George Lucas

Saga: Star Wars

Studio: Lucasfilm, Walt Disney Studios

Production facts: The film is produced by Lucasfilm, now owned by Walt Disney Studios films.

Actors: Mark Hamill, Harrison Ford, Carrie Fisher, Billy Dee Williams, Anthony Daniels, David Prowse, Kenny Baker, Peter Mayhew, Frank Oz.

Characters: Luke Skywalker, Han Solo, Princess Leia, Chewbacca, Darth Vader, Yodi, C-3PO, R2-D2

As you can see, we’ve shortened “description” considerably, getting instantly to the point. We’ve moved its content into several different attributes, like “production facts”, “genre”, “actors”, and “saga”. Some of these attributes will be displayed in the search results. However, most of “description” breaks down into useful chunks of information for searching and faceting. This analytical process is a crucial step in creating a superior searchable index and faceted search experience for the end user. 

Breaking up your data in this way — into small bits of bite-size info — ensures that every attribute counts. This is one benefit to using facets

The basic principle is: large amounts of text in one attribute are not as good for search as smaller chunks of text in multiple attributes, many of which can be used as facets. With these facet-chunks, users can filter their results. Some examples: 

  • Include all films produced and written by George Lucas.
  • Exclude all films except science fiction films, sub-genre space opera.
  • See all Star Wars films, thanks to a “Saga” facet.

Most companies already have structured data, so it’s just a matter of transferring that structure when building the searchable index. On the other hand, some companies have multiple data sources, so they have to design new structures while merging these data sources. 

Faceted search — facets are not only for filtering

Setting up your data with bite-size pieces of information not only enables your search engine to filter records. It also focuses attention on the information, making unfiltered searches more precise. This is only possible if you define your facet attributes as searchable — meaning, you tell your search engine to look into the facets “year” and “type” attributes before looking into “title” and “description” Now, a user can type in “70s sci-fi movies” and find Star Wars without using a filter.   

Let’s get technical — engine-level faceting & JSON

There’s a lot to consider technically when building facets into your index:

  • Representing facets as input into your index.
  • Representing facets as input into a search request.
  • Receiving facets in response to a search request.
  • Combining facets with AND, OR, NOT.
  • How the engine represents facets.

In this article, we lay down the foundations of facet filtering and faceted search by discussing only the first item – How to break down and represent your data as facets. Though we touch on the other aspects, we’ll save more detailed discussions for future articles. 

Representing facets as input into your search index

The best facet data format allows for:

  • A totally customizable set of facets, as determined by your application/business needs.
  • An easily understood data structure. 
  • Flexibility, so that every record in the index can have a different set of attributes.
  • A scalable structure that allows for multiple facet values and category hierarchies.

In a word, JSON.

What is JSON vs. relational databases

Here’s a very, very quick history: Relational databases established a standard with foreign keys, atomic data, and other such improvements over previous data technologies. Relational databases allowed people to normalize their data by distributing every item’s characteristics in a database over a system of small, clearly defined linked tables. It created homogeneity, with a reliable and easily understood consistent structure. However, this homogeneity did not always meet the needs of every application with different data needs. Thus, the solution was to create multiple databases with different structures, sometimes with the same underlying information — a difficult-to-maintain solution that required extra resources. 

The Entity-Attribute-Value (EAV) model added flexibility to the relational model: now, every item could be defined differently from each other in the same database. EAV was all about heterogeneity — allowing a single database to contain a wide variety of information in an efficient manner, where every item could have its own unique and complete set of descriptive tags called key/values. EAV kept to the principle of no repetition of data by using multiple schemas and often localizing its unique values in a single table. 

JSON both encapsulates the EAV principle and breaks away from EAV’s relational roots, most notably by doing away with the schema of tables and rows and by relaxing the principle of no repetition of  data. JSON repeats and repeats information, which is why it’s so easy to use and so powerful and flexible. There are no more constraints – every record item lives within a vacuum, containing its own unique set of attributes. There is also no more predefined schema that an entity must follow. Now, there are only JSON objects (or records), and keys, and values. 

Why is JSON so good for building a searchable index? 

A search engine needs to treat every record in its index atomically, not relying on relationships with other records. It needs to de-normalize its data, not normalize it like a relational database. 

Additionally, the engine needs to be agnostic about what a record contains and therefore be capable of searching into any index, whether commercial products, movies, or professional services. It only needs to return any record that matches the search query. Most search engines do not rely on semantics; instead, they rely on textual matching, which means that a search engine only needs to match the content in an item’s attributes with the text in the search bar. The match can be full or partial. JSON, with its readability, focuses the data exclusively on its textual content.

JSON key/values are easy-to-read, easy-to-search.

[
  {
    "objectID": 42,
    "title": "Star Wars: Episode V - The Empire Strikes Back",
    "type": “movie”,
    "genre": ["science fiction", "space opera"],
    "series": "star wars"
  },
  {
    "objectID": 43,
    "title": "LEGO: Star Wars Yoda",
    "type": “toy”,
    "brand": “Lego”,
    "series": "star wars"
  }
]

It’s pretty clear what’s going here. We have two records. One is a movie, the other is a toy. They have different attributes: the movie has a “genre”, the toy has a “brand”. However, they both have “series”, which can be used optionally to link the two records on “Star Wars”.  

Some search engines use JSON formats directly as the basis of their search, with no conversion required. Most, however, convert this data into the format of their own proprietary index. In either case, the process is the same: loop through every record and key, then save (or search) the values. The best algorithms expect nothing but keys and values — though most engines ask for a minimum of required attributes. For example, an “Object ID” — a unique identifier for every record, used to find and update records while indexing. But don’t be mislead. The “objectID” doesn’t function like a primary key, as in a relational database. In search, there is little use for the primary/foreign key relationship. Interestingly, instead of keys, search relies on facets to link records. You can see that in our above example: toys and movies were linked by the “series” facet.  

REST APIs and the joys of generating and visualizing JSON

Like JSON, REST API simplified a standard by offering an alternative to SOAP API. It did this for similar reasons: to add more flexibility with less formality. SOAP is based on a protocol with strict specifications and rules. REST API does away with that. It provides an endpoint and accepts different data formats, such as CSV, XML, and JSON. It is up to the endpoint and the REST API developer to communicate to its users what to send in the formatted data. The best APIs don’t require too much specific data. They focus on the information needed to perform their job. 

For example, here’s how to search:

curl -X POST \
     -H "API-Key: 123” \
     -H "Application-Id: app_movies” \
     --data-binary '{ "params": "query=star wars” }' \
     "https://some-movie-api.com/indexes/movies/query"

The “data-binary” field contains the whole message: execute a search using these parameters. In this case – no surprise: this is a search for “star wars”.

From Curl to API Clients

Here’s how to create a record in an index called “app-movies” using an API client. The best search engines wrap Curl’s tedious syntax into APIs in different programming languages. For example, the following code is using a JavaScript API client:

index.search(“star wars”);
index.save('{
"ojectID": 42, "title":"Start Wars: Episode V - The Empire Strikes Back","Description":"An American space opera, Luke Skywalker, along with Han Solo, Princess Leia, and Chewbacca, fight Darth Vader and the Rebel Alliance to save the Galactic Empire.","Genre": ["science fiction, space opera"],"Era": ["70s, 80s"]","story":"George Lucas","Saga: Star Wars","Studio":["Lucasfilm, Walt Disney Studios"],"Production facts":"The film is produced by Lucasfilm, now owned by Walt Disney Studios films.","Actors":["Mark Hamill, Harrison Ford, Carrie Fisher, Billy Dee Williams, Anthony Daniels, David Prowse, Kenny Baker, Peter Mayhew, Frank Oz"],"Characters":["Luke Skywalker, Han Solo, Princess Leia, Chewbacca, Darth Vader, Yodi, C-3PO, R2-D2]"
}' );

Whichever format you use, Curl or JavaScript, you can see how easy it is to encapsulate the full “Star Wars: Episode V – The Empire Strikes Back” example at the start of this article. And to send it in a human- and machine-readable format. 

Parting words and a few more details on faceted search

One key takeaway is that facets are not optional. They play a role in every aspect of the search experience. Functionally, facets are used for searching data, filtering records, and ordering results. Technically, they organize your data and ensure its quality and precision, central to creating a quality search experience. 

The other key takeaway is that how you structure your facets in your index determines their functional effectiveness. Modern search needs the flexibility to adapt to the fast-paced changes of every online business. It also needs the diversity to manage a large and constantly evolving variety of user tastes and needs. JSON enables this kind of flexibility and diversity. Developers need a schema-less data structure like JSON format to represent the multiplicity of facets and filtering in the best way possible. For example, nested attributes. We’ll see that in Part 2 of this series on facets & faceted search.


This is the first article in our Facets & Data series. Our focus in this series is technical, outlining the logic and facets data model of facet search.

  • The first article defines what faceting is, and explained the critical role that facets play in structuring your data. It also illustrated how JSON is the most flexible way to represent your index data including facets.
  • In this article – the second article – we introduce the most common data structures for facets: simple facet values, nested faceting, hierarchical categories, and user and AI tagging, all of which are used for different aspects of facet search.
  • The third article — Implementing faceted search with dynamic faceting — continues with data as our central focus, but we now discuss the process behind accessing the data. We look at the query search process, from query to execution to response, and show how to generate facets dynamically.
About the author
Peter Villani

Sr. Tech & Business Writer

linkedinmediumtwitter

Recommended Articles

Powered byAlgolia Algolia Recommend

Implementing faceted search with dynamic faceting (code included)
engineering

Peter Villani

Sr. Tech & Business Writer

A facets data model using JSON
engineering

Peter Villani

Sr. Tech & Business Writer

Search Index 101: Everything You Wanted to Know…
product

Peter Villani

Sr. Tech & Business Writer