> ## Documentation Index
> Fetch the complete documentation index at: https://algolia.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Splitting and concatenation

> How Algolia splits and concatenates user queries to improve relevance.

export const Records = () => <Tooltip tip="A record is a searchable object in an Algolia index. Each record consists of named attributes." cta="Algolia records" href="/doc/guides/sending-and-managing-data/prepare-your-data#algolia-records">
    records
  </Tooltip>;

export const Index = () => <Tooltip tip="An Algolia index is a searchable dataset that consists of records and configuration settings. These settings define how the records are searched and ranked.">
    index
  </Tooltip>;

Algolia improves search relevance by splitting long words into shorter ones and combining (concatenating) short words into longer ones.
This helps users find results even when their query doesn't exactly match your indexed <Records />.
You can adjust this behavior in the [Algolia dashboard](https://dashboard.algolia.com//explorer/configuration/typo-tolerance)
or with the [`typoTolerance`](/doc/api-reference/api-parameters/typoTolerance) parameter.

To learn more about query processing,
see [Tokenization](/doc/guides/managing-results/optimize-search-results/handling-natural-languages-nlp/in-depth/tokenization).

<Info>
  By default, Algolia matches query terms at the **beginning of words** in your records (prefix matching). For alternatives, see [Match queries in the middle or end of words](/doc/guides/managing-results/optimize-search-results/override-search-engine-defaults/how-to/how-can-i-make-queries-within-the-middle-of-a-word).
</Info>

## Splitting

When processing user queries,
Algolia attempts to improve relevance by splitting a single query term into two separate words.
This helps return results when users accidentally concatenate words,
like typing `katherinejohnson` instead of `katherine johnson`.

Algolia splits query words into only two parts to improve relevance without sacrificing performance.
This reduces the number of generated tokens,
keeping search fast and efficient while still improving matches for concatenated terms.
For example, the query `jamesearljones` is split into `james` and `earljones`,
not into `james`, `earl`, and `jones`.

### How splits work

For each word in a query,
Algolia evaluates every possible two-part split.
For example, the query  `katherinejohnson` could generate the following splits:

* `katherinejohnson`
* `k`, `atherinejohnson`
* `ka`, `therinejohnson`
* `kat`, `herinejohnson`
* `kath`, `erinejohnson`
* `kathe`, `rinejohnson`
* `kather`, `inejohnson`
* `katheri`, `nejohnson`
* `katherin`, `ejohnson`
* `katherine`, `johnson`
* `katherinej`, `ohnson`
* `katherinejo`, `hnson`
* `katherinejoh`, `nson`

Algolia splits query terms only if they're at least as long as the value defined by [`minWordSizefor1Typo`](/doc/api-reference/api-parameters/minWordSizefor1Typo).
By default, this is 4 characters,
so terms shorter than this (such as `car`) aren't split,
while longer terms (such as `kath` and `katherinejohnson`) can be.

<Note>
  The first part of the split can be up to 12 characters long,
  while the second part can be any length.
</Note>

Algolia uses a split as an **alternative search term** if *both* parts of the split exist as distinct words in your <Index />.
For example, if `katherine` and `johnson` are both in your records,
Algolia adds `katherine johnson` as an alternative search term.
If both aren't in your records, this split is ignored.

Alternative search terms are treated as [sequence expressions](/doc/guides/managing-results/optimize-search-results/handling-natural-languages-nlp/in-depth/tokenization#sequence-expressions),
which means that the split terms must be next to each other and in the same order in an attribute.

Algolia may generate multiple splits.
For example, it can split `nowhere` into `no` and `where`, or `now` and `here`.
It selects the split that matches the most records.
A split may not be used if the original query term yields better results.

## Concatenation

Algolia concatenates tokens to improve matching for acronyms and contractions.

### Concatenation during indexing

During indexing,
Algolia [combines tokens](/doc/guides/managing-results/optimize-search-results/handling-natural-languages-nlp/in-depth/tokenization) separated by:

* `.` (period)
* `'` (apostrophe)
* `®` (registered symbol)
* `©` (copyright symbol)

This helps index acronyms such as `B.C.E.` and contractions such as `don't`.

For example, `hello.world` creates the tokens `hello`, `.`, and `world`,
and then `helloworld` after concatenation.
The `.` character is a separator and isn't indexed by default
(see the [`separatorsToIndex`](/doc/api-reference/api-parameters/separatorsToIndex) parameter).

Algolia doesn't index tokens shorter than three characters.
For example, `B.C.E.` creates `B`, `.`, `C`, `.`, `E`, and `BCE`.
It indexes only `BCE`, not `B`, `C`, `E`, or the separator `.`.

### Concatenation at query time

Algolia performs the same concatenation in search queries as it does during indexing.
It also uses:

* **Bi-gram concatenation**. The engine merges each pair of adjacent tokens for the first five words in the query.
* **All-word concatenation**. If the query contains three or more words, the engine combines all words into a single token.

For example, the search query `a wonderful day in the neighborhood` results in these tokens:

* Initial tokenization: `a`, `wonderful`, `day`, `in`, `the`, `neighborhood`
* Bi-gram concatenation: `awonderful`, `wonderfulday`, `dayin`, `inthe`
* All-word concatenation: `awonderfuldayintheneighborhood`

### Concatenation with numbers

Algolia applies specific logic for concatenating tokens with numbers and separators:

* **If a token starts with a number, Algolia doesn't merge it with adjacent ones.** For example, `m.55` creates `m55`, but `5.mm` forms `5` and `mm`, not `5mm`. This avoids misinterpreting floating point numbers, so `1.3GB` isn't treated as `13GB`.
* **When a number appears next to a separator, Algolia indexes each adjacent non-separator token individually, regardless of length.** For example, `3.GB` creates `3`, `.`, and `GB`. Algolia indexes `3` and `GB` but not `3GB`, because it starts with a number.
* **Algolia skips bigram concatenation when two adjacent tokens both start or end with digits.** This prevents irrelevant combination in queries such as `XC90 2020 Volvo`, where merging the terms into `XC902020` would reduce relevance and produce inaccurate matches.

<Note>
  Algolia applies specific logic to hyphenated attributes,
  which can affect search behavior.
  For example,
  for hyphenated [ISBN](https://en.wikipedia.org/wiki/ISBN) or part numbers,
  all-word concatenation or careful attribute formatting helps ensure good search relevance.
  For more information, see [Searching in hyphenated attributes](/doc/guides/managing-results/optimize-search-results/typo-tolerance/how-to/how-to-search-in-hyphenated-attributes).
</Note>

## Improve relevance for single word and ambiguous queries

In some cases, especially with short or ambiguous queries,
Algolia may split or interpret terms in unexpected ways.
For example, a search for `Augusta` might return a less relevant result that contains `August a`,
because Algolia interprets this as a better match based on frequency or attribute position.

To improve relevance in these cases:

* **Use unordered attributes.** Make sure most of your [searchable attributes](/doc/guides/managing-results/must-do/searchable-attributes)
  (such as description, name, or title)
  are [`unordered`](/doc/guides/managing-results/must-do/searchable-attributes/how-to/configuring-searchable-attributes-the-right-way#word-position).
  This removes any differences in ranking due to the position of the matches.
* **Try prioritizing exact matches for single word queries.** Consider setting [`exactOnSingleWordQuery`](/doc/api-reference/api-parameters/exactOnSingleWordQuery) to `word`.
  This boosts exact matches when the user's query is only one word.

If *specific* queries don't return the expected results,
consider:

* **Adding a `keywords` attribute.**
  Add a `keywords` field to your records with the exact words you want user queries to match.
  Put the attribute at the top of your searchable attributes list.
  This favors exact matches with the defined keywords,
  due to the [Attribute criterion](/doc/guides/managing-results/relevance-overview/in-depth/ranking-criteria#attribute).
* **Turning off typo tolerance for specific words.**
  Use [`disableTypoToleranceOnWords`](/doc/api-reference/api-parameters/disableTypoToleranceOnWords) to require exact matches for specified words.
  Be cautious with this approach,
  as users must spell the word exactly for it to match.

After making changes, test your configuration with a variety of queries.
Only apply updates that improve relevance.
If possible, [A/B test](/doc/guides/ab-testing/what-is-ab-testing) the configuration before rolling it out to all users.
