> ## Documentation Index > Fetch the complete documentation index at: https://algolia.com/llms.txt > Use this file to discover all available pages before exploring further. # actions > Determines which web pages are translated into Algolia records and in what way. * **Type**: `Action []` * **Required** A single action defines: * The URLs to crawl * The extraction process for those websites * The indices to which the extracted records are added A single web page can match multiple actions. In this case, your crawler creates a record for each matched action. ## Examples ```js JavaScript icon=code theme={"system"} { actions: [ { indexName: 'dev_blog_algolia', pathsToMatch: ['https://blog.algolia.com/**'], fileTypesToMatch: ['pdf'], autoGenerateObjectIDs: false, schedule: 'every 1 day', recordExtractor: ({ url, $, contentLength, fileType, dataSources }) => { ... } }, ], } ``` ## Parameters ### Action Index name targeted by this action. This value is appended to the `indexPrefix` if specified. URL patterns for web pages to which this action should apply. The patterns are evaluated using the [`micromatch`](https://github.com/micromatch/micromatch) library. You can use wildcard characters, negation, and more. A JavaScript function to extract content from a web page and turn it into Algolia records. The function should return a JSON array which may be empty. An empty array means the page is skipped. **Example:** ```js JavaScript icon=code theme={"system"} recordExtractor: ({ url, $, contentLength, fileType }) => { return [ { url: url.href, text: $("p").html(), // ... anything you want }, ]; // return []; skips the page }; ``` A [Cheerio instance](https://cheerio.js.org) with the HTML of the crawled page. Number of bytes of the crawled page. External data sources for the crawled page. Each key corresponds to an [`externalData`](/doc/tools/crawler/apis/configuration/external-data) object. **Example:** ```js JavaScript icon=code theme={"system"} { dataSources: { dataSourceId1: { data1: "val1", data2: "val2" }, dataSourceId2: { data1: "val1", data2: "val2" }, }, } ``` File type of the crawled page or document. Helper functions for extracting content and turning it into Algolia records. A function that extracts content from pages identified as *articles*. Articles are pages with the `og:type` meta tag: `` or with the [JSON-LD schema](https://schema.org/) types: `Article`, `NewsArticle`, `Report`, or `BlogPosting`. **Example:** ```js JavaScript icon=code theme={"system"} recordExtractor: ({ url, $, helpers }) => { return helpers.article({ url, $ }); }; ``` The `article` helper returns an object with the following properties: ```ts TypeScript icon=code theme={"system"} { /** * The object's unique identifier, * in this case, the article's URL */ objectID: string, /** * The article's URL (without parameters or hashes) */ url: string, /** * The language the page content is written in * - html[attr=lang] */ lang?: string, /** * The article's headline (selected from one of the following in order of preference): * - meta[property="og:title"] * - meta[name="twitter:title"] * - head > title * - First

non-html-documents). Non-HTML file types are first converted to HTML with Apache Tika, then processed as an HTML page. Whether to generate object IDs for records that don't already have an `objectID` field. If false, extracted records without object IDs throw an error.