Tools / Crawler / Getting started

Since you can only crawl and index content from domains you own, you need to verify each domain you want to crawl.

Add domains

  1. Open the Domains page in the Crawler dashboard.
  2. Click Add new domain.
  3. In the App ID field, enter your Algolia application ID, which you can find in the Algolia dashboard. The Crawler creates and updates indices in this application.
  4. In the Domains and subdomains field, enter the domains or subdomains you want to crawl, for example, algolia.com, www.algolia.com, or support.algolia.com.
  5. Click Add domain.

Verify your domain

Email verification is the default, automated way of verifying your domain ownership. If you validated your email when you first set up an Algolia account, the domain of this email address is compared with your crawler site domain. If they match, your domain is verified.

Check the verification status from the Domains page in the Crawler dashboard.

If the site and email domains don’t match, you need to verify your site with one of these methods: meta tag, HTML file, DNS record, or robots.txt.

  • If you can’t update your server files, choose the DNS record option.
  • If you can update files, the meta tag option is the best choice as it doesn’t require adding an extra HTML file. The robots.txt method is an older option that was the only way to verify in earlier versions of the Crawler.

Meta tags

To add a domain verification code to a meta tag, open the Meta tag tab.

Meta tag verification

Copy this tag and place it in your home page’s header HTML near the other meta tags. The tag only needs to be added to one page. If you prefer it not to be on the home page, add it elsewhere and enter its location in Optional URL.

Don’t insert the tag as a Google Tag Manager (GTM) script tag. If you do, the Crawler ignores it.

Once you’ve added the tag and published the updated page, click Verify now next to the appropriate domain on the Domains page in the Crawler dashboard.

Algolia confirms domain ownership after detecting the meta tag.

HTML file

To add a verification code to a custom HTML file, open the HTML file tab.

HTML file verification

  1. Copy this content.
  2. Save it as a new HTML file and upload it to your web server.
  3. Add the file’s URL to Optional URL. The URL must be within the domain you’re verifying.
  4. Click Verify now next to the appropriate domain on the Domains page in the Crawler dashboard.

Algolia confirms domain ownership after detecting the file.

robots.txt

To add a verification code to your robots.txt file, open the Robots.txt tab.

robots.txt verification

Paste the code into your site’s robots.txt file.

1
2
3
4
5
 # Algolia-Crawler-Verif: XXXX

 User-Agent: *
 Allow: /
 # ...

Once you’ve published the updated file, click Verify now next to the appropriate domain on the Domains page in the Crawler dashboard.

Algolia confirms domain ownership after detecting the code.

DNS

To add a verification code to your DNS records, open the DNS tab.

DNS verification

On your DNS provider’s site, locate the section responsible for managing DNS and select an action to add a new record. Copy the provided DNS TXT record data into the respective fields on your DNS provider’s site. For example:

  • Type: TXT
  • Host: algolia-site-verification
  • Value/Answer: 269505BC631812DA (in the example)

It can take up to 72 hours for your DNS record to be updated.

Once the DNS record has been updated, click Verify now next to the appropriate domain on the Domains page in the Crawler dashboard.

Algolia confirms domain ownership after detecting the DNS record.

Create a new crawler

After verifying your domain, you can create a new crawler.

  1. Open the Crawlers page in the Crawler dashboard.
  2. Click New Crawler and enter the following information:

    • Your crawler name. Enter a descriptive name for your crawler.
    • App ID. Enter the same Algolia application ID you specified when adding a domain. The indices and extracted records are added to this application.
    • Start URL. Enter a URL as the starting point for the crawler. The best starting URL is your domain’s home page. The Crawler uses yoursitemap.xml to find its starting URLs. If your site doesn’t have a sitemap, enter the URL with the most links to other pages.
    • Crawler template. If you want to configure a new crawler for one of the supported static site generators, select that configuration template. Otherwise, select the default template.
  3. Click Create to finish the configuration of your crawler and run a test crawl.

Run the test crawl

To test if the crawler can access your site, find links, extract content, and upload them to an Algolia index, the initial crawl visits up to 100 URLs. For a summary of in-progress crawls, go to the Overview page.

In-progress crawl

A successful test lets you view the status, extracted content, and discovered links for each URL.

Crawler overview after a crawl has finished

Review the records the crawler created during this crawl in the Algolia dashboard.

Next steps

The default configuration for a test crawl often falls short of what’s needed. For instance, you might need to edit the configuration to set up scheduled automatic crawls, choose specific URLs to include or exclude, and determine what information to extract from each page.

For more information, see Configure a crawler with the visual UI or, for more fine-grained control, see Configure a crawler with the editor.

Did you find this page helpful?