Create new crawlers

Since you can only crawl and index content from domains you own, you need to verify each domain you want to crawl.

Add domains

Sign in to the Algolia dashboard.
On the left sidebar, select Data sources.
Select Crawler:
- Click Add your domain and enter the domains or subdomains you want to crawl, for example, algolia.com, www.algolia.com, or support.algolia.com.
- If you’ve already added a domain, click the Domains tab.
Click Add domain.

Verify your domain

Email verification is the default, automated way of verifying your domain ownership. If you validated your email when you first set up an Algolia account, the domain of this email address is compared with your crawler site domain. If they match, your domain is verified. Check the verification status from the Domains page in the Crawler dashboard. If the site and email domains don’t match, you need to verify your site with one of these methods: meta tag, HTML file, DNS record, or robots.txt.

If you can’t update your server files, choose the DNS record option.
If you can update files, the meta tag option is the best choice as it doesn’t require adding an extra HTML file. The robots.txt method is an older option that was the only way to verify in earlier versions of the Crawler.

Meta tags

To add a domain verification code to a meta tag, open the Meta tag tab.

Click Copy and place this copied tag in your home page’s header HTML near the other meta tags. The tag only needs to be added to one page. If you prefer it not to be on the home page, add it elsewhere and enter its location in Optional URL.

Don’t insert the tag as a Google Tag Manager (GTM) script tag. If you do, the Crawler ignores it.

After adding the tag and published the updated page, click Verify now next to the appropriate domain on the Domains page in the Crawler dashboard. Algolia confirms domain ownership after detecting the meta tag.

HTML file

To add a verification code to a custom HTML file, open the HTML file tab.

Click Copy.
Save the copied content as a new HTML file and upload it to your web server.
Add the file’s URL to Optional URL. The URL must be within the domain you’re verifying.
Click Verify now next to the appropriate domain on the Domains page in the Crawler dashboard.

Algolia confirms domain ownership after detecting the file.

robots.txt

To add a verification code to your robots.txt file, open the Robots.txt tab.

Click Copy and paste the copied code into your site’s robots.txt file.

# Algolia-Crawler-Verif: XXXX

User-Agent: *
Allow: /
# ...

After publishing the updated file, click Verify now next to the appropriate domain on the Domains page in the Crawler dashboard. Algolia confirms domain ownership after detecting the code.

DNS

To add a verification code to your DNS records, open the DNS tab.

Screenshot of the 'DNS' tab in 'Create new crawlers,' showing a TXT record for verification with a 'Copy' button and 'Verify now' button.

Click Copy or take note of the provided DNS TXT record data.
On your DNS provider’s site, locate the section responsible for managing DNS and select an action to add a new record.
Copy the provided DNS TXT record data into the respective fields on your DNS provider’s site. For example:
- Type: TXT
- Host: algolia-site-verification
- Value/Answer: 269505BC631812DA (in the example)

It can take up to 72 hours for your DNS record to be updated. Once the DNS record has been updated, click Verify now next to the appropriate domain on the Domains page in the Crawler dashboard. Algolia confirms domain ownership after detecting the DNS record.

Create a new crawler

After verifying your domain, you can create a new crawler.

Open the Crawler page in the Crawler dashboard.
Click New Crawler and enter the following information:
- Your crawler name. Enter a descriptive name for your crawler.
- App ID. Enter the same Algolia application ID you specified when adding a domain. The indices and extracted are added to this application.
- Start URL. Enter a URL as the starting point for the crawler. The best starting URL is your domain’s home page. The Crawler uses yoursitemap.xml to find its starting URLs. If your site doesn’t have a sitemap, enter the URL with the most links to other pages.
- Crawler template. If you want to configure a new crawler for one of the supported static site generators, select that configuration template. Otherwise, select the default template.
Click Create to finish the configuration of your crawler and run a test crawl.

Run the test crawl

To test if the crawler can access your site, find links, extract content, and upload them to an Algolia , the initial crawl visits up to 100 URLs. For a summary of in-progress crawls, go to the Overview page.

Screenshot of a test crawl in progress with 'Pause crawling' and 'Restart crawling' buttons, monitoring status, and a success notification.

A successful test lets you view the status, extracted content, and discovered links for each URL.

Screenshot of a successful crawl result showing 100 URLs discovered, 83 records created, and 0 errors found.

Review the records the crawler created during this crawl in the Algolia dashboard.

Next steps

The default configuration for a test crawl often falls short of what’s needed. For instance, you might need to edit the configuration to set up scheduled automatic crawls, choose specific URLs to include or exclude, and determine what information to extract from each page. The Crawler’s suggestions feature automatically indicates some useful next steps. For more information, see Configure a crawler with the visual UI or, for more fine-grained control, see Configure a crawler with the editor.

​Add domains

​Verify your domain

​Meta tags

​HTML file

​robots.txt

​DNS

​Create a new crawler

​Run the test crawl

​Next steps

Add domains

Verify your domain

Meta tags

HTML file

robots.txt

DNS

Create a new crawler

Run the test crawl

Next steps