Site Crawling & Federated Search: How to make content discoverable

Today’s users have high standards for their online experiences regardless of the interface or device. A disorganized site, where it is difficult to search for and find relevant content, often fails to meet the expectations of users, costing the business significant opportunities to engage users and meet their needs.

Whether you’re B2C or B2B, an organized site with an optimized search bar allows users to search everywhere and anywhere you want them to with a single keystroke.

In this article, we will discuss how site crawling and federated search can make your site more organized and your content more discoverable.


Why discoverability is key to the search experience on any site 

Regardless of your vertical, in order to serve your users relevant and effective results, your internal search engine needs to be able to access your site’s contents in a systematic way. Oftentimes searches are unsuccessful not because the content or product does not exist, but because the search is improperly optimized

B2C companies can benefit from making products, product guides, videos, articles, and other content from various locations discoverable all at once. Why? Well, up to 43% of visitors immediately navigate to the search bar when they visit a site. Content that is discoverable forms the basis of a helpful search experience and offers benefits to the business, including:

  • Driving conversions, user engagement, and other metrics. When users find relevant results, they increase engagement with your site and are much likely to purchase products, watch videos, read articles, convert, and more.kissmetrics 12%
  • Decreasing bounce rate. Research from KISSmetrics shows that 12% of a website’s visitors will leave for a competitor’s site after an unsatisfactory search. If users can find what they need, they won’t have to go to other sites to look for it.


The challenge for B2B companies is ensuring users can find documentation and resources that may live in different locations with ease. Organizing the content on your domains improves the user experience and empowers users to find content that helps them successfully troubleshoot problems and answer their own questions. This, in turn, can decrease the amount of service tickets your teams have to address, freeing up time for requests that may actually need specialized knowledge. 


Improve your search with site crawling and federated search 

Every user can benefit from an organized site. There are two main ways a site can meet this need:

  1. Get your content under control with a custom site crawler
  2. Show users comprehensive results with federated search 

Search functions as a powerful tool to help users cut through the irrelevant content and products to get to what they need. Without search, it is difficult for users to know where to start. Both site crawling and federated search are foundational drivers of search and discovery experiences that connect users to their current and future needs.


Structure your site with a site crawler

Site crawling is a powerful backend tool that makes content more organized so it can be discovered by users. A site crawler extracts and structures the content of a site and has the ability to make any object or record searchable.

A site crawler is implemented in a few steps:

  1. Define the entry point.
  2. The crawler extracts and formats data.
  3. Data is sent to the search provider.
  4. Your team can focus on the search UI, if building one.

Once the site crawler structures your data, you can use the results for a range of applications, including search. Do note though that typically some mid-level coding skills are required for the extraction step. While not entirely a code-free process, developers benefit from ultimate flexibility. Look for a site crawling tool that allows developers to control when the crawler is called and when extraction occurs to give you maximum flexibility in the process.


The benefits of using a site crawler

Site crawlers offers some unique benefits, including:

  • Making any webpage structured content 
  • Eliminating the need to build data pipelines between each of your content repositories
  • Enriching crawled and extracted content with business data 
  • No need to edit source code to crawl the site
  • Plugging crawler into Google Analytics to enrich website records and improve relevance
  • Indexing any website that requires JavaScript to work (specific to Algolia’s crawler

Additionally, there are competitive advantages and benefits based on the crawler you choose:

  • White box approach. For some vendors, the search engine and crawler are packaged together, essentially in a black box the user can’t decipher. This means developers can’t leverage the crawler or the API independently. Providers like Algolia make a clear distinction between search engine API and the crawler. With this white box approach, you can see where the distinct functionalities of both and use the one that best matches your needs.
  • Supports SEO and site monitoring. When crawlers call the whole website for structuring purposes, they can also surface minor or major site construction errors, such as pages without titles, broken links, or SEO issues. 
  • Build POCs or an initial demo of site search with the crawler. Algolia customers can use the crawler to demo Algolia and see if it’s a good fit. While not the main purpose of the crawler, it allows potential customers to assess how the product works without having to do any additional coding or hire someone to implement a demo.


Make the search experience more comprehensive with federated search 

Products and content that are relevant to users are often stored in diverse product catalogs, domains and databases, but navigating to these different locations can create friction in the search experience. 

Federated search is a robust UI element that streamlines the search process by serving all the relevant results from multiple data sources at once. Using an extension of the search-time merging method, which runs separate searches on different data locations using multiple indices, the federated search interface presents a results list for each type of content in one combined interface. 

The Federated Search Interface schema

Constructing the federated search interface requires some planning and forethought about the type of search experience you want your users to have. Once set up, it can greatly enhance the user experience, by enabling each search to have a broad range and scope. It also affords the business enhanced control, since the search experience can be curated by those who know the content and/or products and the business goals best. 


5 benefits of federated search

Federated search can benefit sites of a variety of use cases in a number of ways:

  1. Allows product owners to fine-tune the relevance for each type of content
  2. Empowers site owners to control the experience they want visitors to have
  3. Supports browsability by making different categories easily accessible
  4. Lets the site remain searchable while products and contents are added to the site
  5. Increases security since only one search engine needs maintenance

Offering curated, easy-to-interpret results can go a long way in making the user experience more streamlined and users more engaged. Of course it’s important to carefully consider user needs when designing the interface so as not to overwhelm them with useless results. 


3 ways to use site crawling and federated search on your site 

Companies are employing site crawling and federated search in a number of ways. Here are a few examples where site crawlers and federated search have been applied to improve the user experience and the site:


Example 1: Making intranet content discoverable

It’s common for companies to host a number of different subdomains to strategically segment different types of content, products, and information. However, these various subdomains can frustrate internal users, especially when a company’s intranet content is also spread across 10-20 different sites. 

Businesses can use a site crawler to solve this exact problem. With the crawler, the site can gather all of their docs and PDFs regarding company policies, product and services information, company strategy and more into one single page, so users no longer struggle to find what they need.


Example 2: Improve visibility into (and control over) international sites with site crawling 

Large companies maintain different iterations of their site in different countries and languages, which are often managed by local teams. For these companies, streamlining and replicating the search experience across sites is important, but gathering the data to do so is often a major issue. Local teams may not have access to all of the site data to properly implement the search. This might be because they lack the correct clearance level, don’t have access to necessary data because their CMS does not allow them control, or they lack engineering resources or time to gather the data.

This is where the site crawler can be leveraged. The site crawler can extract and structure the site contents without backend permissions. The company can use that data or send it to a search provider like Algolia to start building a consistent and robust search UI across the various sites.


Example 3: Federated search powered by the Crawler

Algolia’s search provides a great example of Federated Search and the Site Crawler working in tandem. In fact, Algolia’s Federated Search interface is partially powered by the crawler.

Algolia uses a different crawler for documentation, the marketing website, and the blog. These crawlers produce three different searchable indices. As a user types in the search bar, the three indices created by the crawler and indices created by other integrations, such as the Search API, return results in the Federated Search interface. With a single query, the interface returns results from every part of the site that are robust, easy-to-understand results and constantly up-to-date.


Connect your users to what matters with site crawlers and federated search

Effective search is a cornerstone of a rewarding user experience on your site. Algolia offers powerful search tools that minimize the time to market and maximize your return on investment from search. Empower your developers to seamlessly unleash your content and improve your relevance with Algolia’s custom Crawler. Delight your users with Algolia’s out-of-the-box Instant Search tools, like Federated Search, Search as you type, Filters and Facets, and more. Great search experiences should be tailored to your users and your business needs. Read our eBook “7 ways to get more out of Algolia search” to learn even more ways to fine-tune your search strategies and improve discoverability, relevance, and user engagement. 

About the authorSamuel Bodin

Samuel Bodin

Software Engineer Crawler

Recommended Articles

Powered by Algolia AI Recommendations

What is a web crawler?

What is a web crawler?

Catherine Dee

Catherine Dee

Search and Discovery writer
What is Federated Search?

What is Federated Search?

Louise Vollaire

Louise Vollaire

Product Marketing Manager
How to optimize your ecommerce site search

How to optimize your ecommerce site search

Louise Vollaire

Louise Vollaire

Product Marketing Manager