> ## Documentation Index
> Fetch the complete documentation index at: https://algolia.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Crawler concepts

> Learn more about configuration templates, helpers, actions, and crawler indices.

export const Records = () => <Tooltip tip="A record is a searchable object in an Algolia index. Each record consists of named attributes." cta="Algolia records" href="/doc/guides/sending-and-managing-data/prepare-your-data#algolia-records">
    records
  </Tooltip>;

export const Index = () => <Tooltip tip="An Algolia index is a searchable dataset that consists of records and configuration settings. These settings define how the records are searched and ranked.">
    index
  </Tooltip>;

## Configuration templates

Configuration templates help you with creating your crawler configuration.
They contain pre-built [actions](#actions) for extracting data from your site, based on known page layouts.
When creating your crawler, you can choose between these templates:

* **Default.**
* **[Static site generators](https://docsearch.algolia.com/docs/templates/#docusaurus-v1-template).**
  These templates extract content from site generators like Docusaurus and VuePress.

<Info>
  After choosing a template, you can edit the configuration to change or extend it.
</Info>

## Helpers

Helpers are functions that make it easier to extract relevant content from your page. You can use them in your [actions](#actions).
For example:

* If you have a page that you declared as an *Article* with metadata,
  you can use [`helpers.article`](/doc/tools/crawler/apis/configuration/actions#param-helpers-article) to extract <Records /> from it.
* If you have a long page, you can use [`helpers.splitContentIntoRecords`](/doc/tools/crawler/apis/configuration/actions#param-helpers-split-content-into-records) to split the page into smaller chunks.
* If you want to show code snippets in your search results, use [`helpers.codeSnippets`](/doc/tools/crawler/apis/configuration/actions#param-helpers-code-snippets).

For more information, see [Helpers](/doc/tools/crawler/apis/configuration/actions#param-record-extractor-helpers).

## Actions

Actions instruct the crawler what information to extract from matching URLs.
They're part of your [Crawler configuration](/doc/tools/crawler/getting-started/crawler-configuration).
A crawler can have up to 30 [actions](/doc/tools/crawler/apis/configuration/actions).

Each action in the [configuration](/doc/tools/crawler/getting-started/crawler-configuration#the-editor) must include:

* `indexName`: the name of the Algolia <Index /> where you want to store the extracted records.
* `pathsToMatch`: patterns for URLs to which this action should apply. For example: `https://www.algolia.com/blog/**` tells the crawler to run this action on all pages of the Algolia blog.
* `recordExtractor`: a function that defines what information to extract from each visited page, and formats it as records for your Algolia index. You can use [helpers](#helpers) to write less code.

```js JavaScript icon=code theme={"system"}
actions: [
  {
    indexName: "algolia-docs",
    pathsToMatch: ["https://www.algolia.com/doc/**"],
    recordExtractor: ({ helpers }) => {
      return helpers.docsearch({
        recordProps: {
          lvl1: ["header h1", "article h1", "main h1", "h1", "head > title"],
          content: ["article p, article li", "main p, main li", "p", "li"],
          lvl0: {
            selectors: "",
            defaultValue: "Documentation",
          },
          lvl2: ["article h2", "main h2", "h2"],
          lvl3: ["article h3", "main h3", "h3"], 
          lvl4: ["article h4", "main h4", "h4"], 
          lvl5: ["article h5", "main h5", "h5"], 
          lvl6: ["article h6", "main h6", "h6"], 
        },
        aggregateContent: true,
        recordVersion: "v3",
      });
    },
  },
],
```

For complete configurations, see the [examples repository on GitHub](https://github.com/algolia/crawler-configurations-examples).

## Indices created by the crawler

An [index](/doc/guides/sending-and-managing-data/prepare-your-data/in-depth/prepare-data-in-depth#algolia-index) is where the Algolia Crawler stores the extracted data from your pages.
In most cases, you'll have one index for each content type, such as articles or products.
You can find all your indices, including those that weren't created by the Crawler,
in the [Algolia dashboard](https://dashboard.algolia.com/).

The Algolia Crawler creates three types of indices:

* **Production** indices don't have suffixes. The production index contains the records extracted by the latest crawl. If you manually start a new crawl and a scheduled crawl is ongoing, these records are added to a temporary index instead.
* **Backup** indices have the .bak suffix. For extra safety, you can keep a backup of your last production index. To learn more, see the [`saveBackup`](/doc/tools/crawler/apis/configuration/save-backup) parameter.
* **Temporary** indices have the .tmp suffix. During a crawl, the Crawler adds extracted records to a temporary index. If the crawl is successful, the temporary index from the latest crawl replaces the production index.
