What is online retail merchandising? An introduction
Done any shopping on an ecommerce website lately? If so, you know a smooth online shopper experience is not optional ...
Sr. SEO Web Digital Marketing Manager
Done any shopping on an ecommerce website lately? If so, you know a smooth online shopper experience is not optional ...
Sr. SEO Web Digital Marketing Manager
It’s hard to imagine having to think about Black Friday less than 4 months out from the previous one ...
Chief Strategic Business Development Officer
What happens if an online shopper arrives on your ecommerce site and: Your navigation provides no obvious or helpful direction ...
Search and Discovery writer
In part 1 of this blog-post series, we looked at app interface design obstacles in the mobile search experience ...
Sr. SEO Web Digital Marketing Manager
In part 1 of this series on mobile UX design, we talked about how designing a successful search user experience ...
Sr. SEO Web Digital Marketing Manager
Welcome to our three-part series on creating winning search UX design for your mobile app! This post identifies developer ...
Sr. SEO Web Digital Marketing Manager
National No Code Day falls on March 11th in the United States to encourage more people to build things online ...
Consulting powerhouse McKinsey is bullish on AI. Their forecasting estimates that AI could add around 16 percent to global GDP ...
Chief Revenue Officer at Algolia
How do you sell a product when your customers can’t assess it in person: pick it up, feel what ...
Search and Discovery writer
It is clear that for online businesses and especially for Marketplaces, content discovery can be especially challenging due to the ...
Chief Product Officer
This 2-part feature dives into the transformational journey made by digital merchandising to drive positive ecommerce experiences. Part 1 ...
Director of Product Marketing, Ecommerce
A social media user is shown snapshots of people he may know based on face-recognition technology and asked if ...
Search and Discovery writer
How’s your company’s organizational knowledge holding up? In other words, if an employee were to leave, would they ...
Search and Discovery writer
Recommendations can make or break an online shopping experience. In a world full of endless choices and infinite scrolling, recommendations ...
Algolia sponsored the 2023 Ecommerce Site Search Trends report which was produced and written by Coleman Parkes Research. The report ...
Chief Strategic Business Development Officer
You think your search engine really is powered by AI? Well maybe it is… or maybe not. Here’s a ...
Chief Revenue Officer at Algolia
You looked at this scarf twice; need matching mittens? How about an expensive down vest? You watched this goofy flick ...
Sr. SEO Web Digital Marketing Manager
“I can’t find it.” Sadly, this conclusion is often still part of the modern enterprise search experience. But ...
Sr. SEO Web Digital Marketing Manager
Jun 25th 2019 algolia
We were happy to organize our regular Search Party last Wednesday, June 12th, 2019. This time it was about crawling web content.
People tend to think crawling is about stealing other people’s data. Although some crawlers do that, crawling itself is simply the act of extracting content from websites. The motive is more often legitimate than illegal. During this event, we had three amazing talks that presented different ways to crawl web content and discussed some easily overlooked challenges when developing a crawler.
Samuel Bodin, Algolia
In the first presentation, Samuel Bodin gave us a glance into how Algolia indexes complex documents like PDFs, Words, Spreadsheets, … Also, how to render websites with javascript at enormous scale.
He also spoke about the common trap with websites, more specifically, the “Rabbit Hole”, a place where your crawler gets stuck forever.
Last but not least, he gave a quick presentation about how Algolia manages crawling with security concerns. Especially when executing javascript written by customers on Algolia’s server without exposing any sensitive data.
Nenad Tičarić, TNT Studio
In the second presentation, Nenad Tičarić talked about the architecture of a web crawler and how to code it quickly with the php framework Laravel.
He broke his presentation down into two parts. He started with a good overview of crawlers and introduced a few terms that you’ll likely want to know before digging into the subject. He also described how to design and architect an automatic web crawler at scale.
The second part focused on how to achieve that very simply with PHP, and more specifically Laravel, and a very few basic tools like Guzzle and Artisan.
Karl Leicht, Fabriks
For the last talk of the day, Karl Leicht spoke about how to achieve automatic and smart attribute extraction with a crawler.
How to crawl millions of different websites? That’s the interesting question Karl asked us today. He described how to scale your code without reinventing the wheel for every website.
We saw how to differentiate programmatically a listing page and a product page, the importance of microdata and where to find the more valuable information in a page.
The second part focused on the challenges of maintaining this code in the long run, with a long look at tests and monitoring.
We host our Search Party in our Paris office. It’s for everyone and… it’s free! Please join us next time.
Follow us on EventBrite so you can be notified for the next event.
Hear great talks, meet our product specialists, grab some swag.
Powered by Algolia Recommend