Filter By Null or Missing Attributes

What happens when your index contains an attribute that is not present in all records?

For example, consider an online book store where people can buy, but also rate books, from 0 to 5. Any record without the rating attribute is assumed not to be rated yet.

Our data is schemaless, so this is not a problem until you want to filter on records with and without a specific attribute.

Generally speaking, selective filtering becomes a problem when the existence or non-existence of a filter value actually means something. The Algolia engine does not support filtering on null value or missing attributes. In other words, taking the example above, if we wanted to combine books with a specific rating and books that aren’t yet rated in the same filtering statement, this would require some modification of the data.

There are two approaches:

  • using the _tags attribute,
  • using a boolean attribute.

Dataset

Let’s look at the three following records: one with a correctly filled rating attribute, a second with a null rating, and the third without rating:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[
  {
    "title": "The Shining",
    "author": "Stephen King",
    "rating": 5
  },
  {
    "title": "Fantastic Beasts and Where to Find Them",
    "author": "J. K. Rowling",
    "rating": null
  },
  {
    "title": "Run Away",
    "author": "Harlan Coben"
  }
]

Here, only the first record has a rating. The other two are assumed not to have been rated yet. Note that a null or nonexistent attribute is different from zero, which represents a book with a rating equal to 0.

Creating a Tag

At indexing time, you can compute a tag that specifies what it means when the attribute is present, set, or absent.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[
  {
    "title": "The Shining",
    "author": "Stephen King",
    "rating": 5,
    "_tags": ["is_rated"]
  },
  {
    "title": "Fantastic Beasts and Where to Find Them",
    "author": "J. K. Rowling",
    "rating": null,
    "_tags": ["is_not_rated"]
  },
  {
    "title": "Run Away",
    "author": "Harlan Coben",
    "_tags": ["is_not_rated"]
  }
]

To search for records that do not have the attribute or attribute value present, you can now use tags filtering:

1
2
3
$index->search('query', [
  'filters' => '_tags:is_not_rated'
]);

Creating a Boolean Attribute

At indexing time, you can compute a boolean attribute named is_rated:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[
  {
    "title": "The Shining",
    "author": "Stephen King",
    "rating": 5,
    "is_rated": true
  },
  {
    "title": "Fantastic Beasts and Where to Find Them",
    "author": "J. K. Rowling",
    "rating": null,
    "is_rated": false
  },
  {
    "title": "Run Away",
    "author": "Harlan Coben",
    "is_rated": false
  }
]

To search for records that do not have the attribute or attribute value present, you can now use boolean filtering:

1
2
3
$index->search('query', [
  'filters' => 'is_rated = 0'
]);

Did you find this page helpful?