How our website crawler works

A site crawler tool that uncovers all your content, no matter what it’s stored

Provide your users with great site search

your website content siloed in separate systems and managed by different teams? The first step in providing a high-quality site search experience is implementing a first-rate crawling process.

Our web spider can save your company time and lower your expenses by eliminating the need for building data pipelines between each of your content repositories and your site search software, as well as the project management that entails.

Turn your site into structured content

You can tell our website crawler exactly how to operate so that it accurately interprets your content. For example, in addition to standard web pages, you can ensure that it lets users search for and navigate news articles, job postings, and financial reports, including information that's in documents, PDFs, HTML, and JavaScript.

You don't need to add meta tags

You can have your content extracted without first adding meta tags to your site. Our web crawler doesn't rely on custom metadata. Instead, it provides your technical team with an easy-to-use editor for defining which content you want to extract and how to structure it.

Enrich your content to make it more relevant

To enhance search-result relevance for your users, you can enrich your extracted content with business web data, including from Google Analytics and Adobe Analytics. With Algolia Crawler, you can use data about visitor behavior and page performance to adjust your search engine rankings, attach categories to your content to power advanced navigation, and more.

Configure your crawling as needed

Schedule automatic crawling sessions

You can configure our site crawler tool to look at your web data on a set real-time schedule, such as every night at 9 p.m., with a recrawl at noon the next day.

Manually set up a crawl

If necessary, you can manually trigger crawling of a particular section of your website, or even the whole thing.

Tell it where to go

You can define which parts of your site, or which web pages, you want crawled (or avoided) by our web spider, or you can let it automatically crawl everywhere.

Give permission

Configure our crawler to explore and index login protected pages.

Keep your searchable content up to date

URL Inspector

On the Inspector tab, you can see and inspect all your crawled URLs, noting whether each crawl succeeded, when it was completed, and the records that were generated.

Monitoring

On the Monitoring tab, you can view the details on the latest crawl, plus sort your crawled URLs by status (success, ignored, failed).

Data Analysis

On the Data Analysis tab, you can assess the quality of your web-crawler-generated index and see whether any records are missing attributes.

Path Explorer

On the Path Explorer tab, you can see which paths the crawler has explored; for each, how many URLs were crawled, how many records were extracted, and how many errors were received during the crawling process.

The most advanced companies experiment everyday with the crawler

“We realized that search should be a core competence of the LegalZoom enterprise, and we see Algolia as a revenue generating product.”
 

Mrinal Murari

Tools team lead & senior software engineer @ LegalZoom

Recommended content

What is a web crawler?

What is a web crawler?

A web crawler is a bot—a software program—that systematically visits a website, or sites, and catalogs the data it finds.

30 days to improve our Crawler performance by 50%

30 days to improve our Crawler performance by 50%

This article is about how we reworked the internals of our app crawler, looked for bottlenecks, and streamlined tasks to optimize the processing of this complex parallel & distributed software.

Algolia Crawler

Algolia Crawler

An overview of what the Algolia Crawler can do for your website.

Website Crawler FAQ

Enable anyone to build great Search & Discovery