Crawler

Liberate your web content with our crawler

Help users find your content easily by using a customizable and hosted web crawler to catalog and store your site's web pages.

How our website crawler works

A site crawler tool that uncovers all your content, no matter what it’s stored

Provide your users with great site search

Your website content siloed in separate systems and managed by different teams? The first step in providing a high-quality site search experience is implementing a first-rate crawling process.

Our web spider can save your company time and lower your expenses by eliminating the need for building data pipelines between each of your content repositories and your site search software, as well as the project management that entails.

Turn your site into structured content

You can tell our website crawler exactly how to operate so that it accurately interprets your content. For example, in addition to standard web pages, you can ensure that it lets users search for and navigate news articles, job postings, and financial reports, including information that's in documents, PDFs, HTML, and JavaScript.

You don't need to add meta tags

You can have your content extracted without first adding meta tags to your site. Our web crawler doesn't rely on custom metadata. Instead, it provides your technical team with an easy-to-use editor for defining which content you want to extract and how to structure it.

Enrich your content to make it more relevant

To enhance search-result relevance for your users, you can enrich your extracted content with business web data, including from Google Analytics and Adobe Analytics. With Algolia Crawler, you can use data about visitor behavior and page performance to adjust your search engine rankings, attach categories to your content to power advanced navigation, and more.

Configure your crawling as needed

Schedule automatic crawling sessions

You can configure our site crawler tool to look at your web data on a set real-time schedule, such as every night at 9 p.m., with a recrawl at noon the next day.

Manually set up a crawl

If necessary, you can manually trigger crawling of a particular section of your website, or even the whole thing.

Tell it where to go

You can define which parts of your site, or which web pages, you want crawled (or avoided) by our web spider, or you can let it automatically crawl everywhere.

Give permission

Configure our crawler to explore and index login protected pages.

Keep your searchable content up to date

URL Inspector

On the Inspector tab, you can see and inspect all your crawled URLs, noting whether each crawl succeeded, when it was completed, and the records that were generated.

Monitoring

On the Monitoring tab, you can view the details on the latest crawl, plus sort your crawled URLs by status (success, ignored, failed).

Data Analysis

On the Data Analysis tab, you can assess the quality of your web-crawler-generated index and see whether any records are missing attributes.

Path Explorer

On the Path Explorer tab, you can see which paths the crawler has explored; for each, how many URLs were crawled, how many records were extracted, and how many errors were received during the crawling process.

The most advanced companies experiment everyday with the crawler

“We realized that search should be a core competence of the LegalZoom enterprise, and we see Algolia as a revenue generating product.”

Mrinal Murari

Tools team lead & senior software engineer @ LegalZoom

Liberate your web content with our crawler

How our website crawler works

A site crawler tool that uncovers all your content, no matter what it’s stored

Provide your users with great site search

Turn your site into structured content

You don't need to add meta tags

Enrich your content to make it more relevant

Configure your crawling as needed

Schedule automatic crawling sessions

Manually set up a crawl

Tell it where to go

Give permission

Keep your searchable content up to date

URL Inspector

Monitoring

Data Analysis

Path Explorer

The most advanced companies experiment everyday with the crawler

Recommended content

What is a web crawler?

30 days to improve our Crawler performance by 50%

Algolia Crawler

Website Crawler FAQ

Enable anyone to build great Search & Discovery

Agentic intelligence layer powering commerce discovery

A leader for the third consecutive year

Increased Operating Profit and Improved Efficiency

Named a leader in knowledge discovery

Top scores across every B2B category