Search by Algolia
What is online retail merchandising? An introduction
e-commerce

What is online retail merchandising? An introduction

Done any shopping on an ecommerce website lately? If so, you know a smooth online shopper experience is not optional ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

5 considerations for Black Friday 2023 readiness
e-commerce

5 considerations for Black Friday 2023 readiness

It’s hard to imagine having to think about Black Friday less than 4 months out from the previous one ...

Piyush Patel

Chief Strategic Business Development Officer

How to increase your sales and ROI with optimized ecommerce merchandising
e-commerce

How to increase your sales and ROI with optimized ecommerce merchandising

What happens if an online shopper arrives on your ecommerce site and: Your navigation provides no obvious or helpful direction ...

Catherine Dee

Search and Discovery writer

Mobile search UX best practices, part 3: Optimizing display of search results
ux

Mobile search UX best practices, part 3: Optimizing display of search results

In part 1 of this blog-post series, we looked at app interface design obstacles in the mobile search experience ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Mobile search UX best practices, part 2: Streamlining search functionality
ux

Mobile search UX best practices, part 2: Streamlining search functionality

In part 1 of this series on mobile UX design, we talked about how designing a successful search user experience ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Mobile search UX best practices, part 1: Understanding the challenges
ux

Mobile search UX best practices, part 1: Understanding the challenges

Welcome to our three-part series on creating winning search UX design for your mobile app! This post identifies developer ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Teaching English with Zapier and Algolia
engineering

Teaching English with Zapier and Algolia

National No Code Day falls on March 11th in the United States to encourage more people to build things online ...

Alita Leite da Silva

How AI search enables ecommerce companies to boost revenue and cut costs
ai

How AI search enables ecommerce companies to boost revenue and cut costs

Consulting powerhouse McKinsey is bullish on AI. Their forecasting estimates that AI could add around 16 percent to global GDP ...

Michelle Adams

Chief Revenue Officer at Algolia

What is digital product merchandising?
e-commerce

What is digital product merchandising?

How do you sell a product when your customers can’t assess it in person: pick it up, feel what ...

Catherine Dee

Search and Discovery writer

Scaling marketplace search with AI
ai

Scaling marketplace search with AI

It is clear that for online businesses and especially for Marketplaces, content discovery can be especially challenging due to the ...

Bharat Guruprakash

Chief Product Officer

The changing face of digital merchandising
e-commerce

The changing face of digital merchandising

This 2-part feature dives into the transformational journey made by digital merchandising to drive positive ecommerce experiences. Part 1 ...

Reshma Iyer

Director of Product Marketing, Ecommerce

What’s a convolutional neural network and how is it used for image recognition in search?
ai

What’s a convolutional neural network and how is it used for image recognition in search?

A social media user is shown snapshots of people he may know based on face-recognition technology and asked if ...

Catherine Dee

Search and Discovery writer

What’s organizational knowledge and how can you make it accessible to the right people?
product

What’s organizational knowledge and how can you make it accessible to the right people?

How’s your company’s organizational knowledge holding up? In other words, if an employee were to leave, would they ...

Catherine Dee

Search and Discovery writer

Adding trending recommendations to your existing e-commerce store
engineering

Adding trending recommendations to your existing e-commerce store

Recommendations can make or break an online shopping experience. In a world full of endless choices and infinite scrolling, recommendations ...

Ashley Huynh

Ecommerce trends for 2023: Personalization
e-commerce

Ecommerce trends for 2023: Personalization

Algolia sponsored the 2023 Ecommerce Site Search Trends report which was produced and written by Coleman Parkes Research. The report ...

Piyush Patel

Chief Strategic Business Development Officer

10 ways to know it’s fake AI search
ai

10 ways to know it’s fake AI search

You think your search engine really is powered by AI? Well maybe it is… or maybe not.  Here’s a ...

Michelle Adams

Chief Revenue Officer at Algolia

Cosine similarity: what is it and how does it enable effective (and profitable) recommendations?
ai

Cosine similarity: what is it and how does it enable effective (and profitable) recommendations?

You looked at this scarf twice; need matching mittens? How about an expensive down vest? You watched this goofy flick ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What is cognitive search, and what could it mean for your business?
ai

What is cognitive search, and what could it mean for your business?

“I can’t find it.”  Sadly, this conclusion is often still part of the modern enterprise search experience. But ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

Looking for something?

Search Party #18 — Crawling Edition
facebookfacebooklinkedinlinkedintwittertwittermailmail

We were happy to organize our regular Search Party last Wednesday, June 12th, 2019. This time it was about crawling web content.

People tend to think crawling is about stealing other people’s data. Although some crawlers do that, crawling itself is simply the act of extracting content from websites. The motive is more often legitimate than illegal. During this event, we had three amazing talks that presented different ways to crawl web content and discussed some easily overlooked challenges when developing a crawler.

The challenges of crawling the web — and how to overcome them

Samuel Bodin, Algolia

In the first presentation, Samuel Bodin gave us a glance into how Algolia indexes complex documents like PDFs, Words, Spreadsheets, … Also, how to render websites with javascript at enormous scale.

He also spoke about the common trap with websites, more specifically, the “Rabbit Hole”, a place where your crawler gets stuck forever.

Last but not least, he gave a quick presentation about how Algolia manages crawling with security concerns. Especially when executing javascript written by customers on Algolia’s server without exposing any sensitive data.

 

Writing a distributed crawler architecture

Nenad Tičarić, TNT Studio

In the second presentation, Nenad Tičarić talked about the architecture of a web crawler and how to code it quickly with the php framework Laravel.

He broke his presentation down into two parts. He started with a good overview of crawlers and introduced a few terms that you’ll likely want to know before digging into the subject. He also described how to design and architect an automatic web crawler at scale.

The second part focused on how to achieve that very simply with PHP, and more specifically Laravel, and a very few basic tools like Guzzle and Artisan.

 

Automatic extraction of structured data from the Web

Karl Leicht, Fabriks

For the last talk of the day, Karl Leicht spoke about how to achieve automatic and smart attribute extraction with a crawler.

How to crawl millions of different websites? That’s the interesting question Karl asked us today. He described how to scale your code without reinventing the wheel for every website.

We saw how to differentiate programmatically a listing page and a product page, the importance of microdata and where to find the more valuable information in a page.

The second part focused on the challenges of maintaining this code in the long run, with a long look at tests and monitoring.

The Next Event

We host our Search Party in our Paris office. It’s for everyone and… it’s free! Please join us next time.

Follow us on EventBrite so you can be notified for the next event. 

About the author
Julie Reboul

Sr. Developer Marketing Manager

linkedintwitter

Algolia events

Hear great talks, meet our product specialists, grab some swag.

Come meet us
Algolia events

Recommended Articles

Powered byAlgolia Algolia Recommend

It’s a wrap! The Algolia DevCon 2022 Recap
algolia

Chuck Meyer

Sr. Developer Relations Engineer

What is a web crawler?
engineering

Catherine Dee

Search and Discovery writer

Site Crawling & Federated Search: How to make content discoverable
product

Samuel Bodin

Software Engineer Crawler