Hey everyone, today we're gonna chat about bots – what they are, how to spot them, and how to stop them from messing with your Algolia search. This is a brief summary of my presentation at Algolia DevCon about bots and how customers can mitigate blog traffic from their site. For background, I'm a developer support engineer at Algolia and I've been with the company for about two years. I've always been interested in the bot situation on the web, so I'm excited to share this info with you.
Bots are basically computer programs that do the same tasks over and over again. They don't need a human to control them. Now, not all bots are bad. There are good bots out there, like personal assistant bots or chatbots, that are actually helpful. But the bad bots are programmed to do nasty stuff, like crawling sites they shouldn't or scraping content. These are big problems these days, and they can really mess up your applications.
Even Google’s bots can be a pain. If your landing page has a search box that does an empty search to show initial results, they might index pages by searching over and over again. That can cause a ton of extra search traffic you don't want.
We've been fighting bad bots for a long time. Back in 2015, there wasn't as much bad bot traffic, mainly because there were a lot of new people using the internet. But the bot creators got smarter and started making bots that could do more damage with less effort. In the past five years, bot traffic has gone way up. In 2023, bad bots made up a whopping 32% of all internet traffic!
Bots are always changing, and now we've got AI-powered bots that can act like humans, get around those "I'm not a robot" challenges, and launch even more sophisticated attacks.
Algolia has a public REST API, which means bots can access it and mess with your search. Since your sites are usually open to anyone, bots can search just like a regular user, but without any limits.
So, how can you tell if bots are messing with your search? Here are a few signs:
Weird searches: Let's say you sell candy. You'd expect searches for "sour gummies" or "chocolate bars," not "car insurance" or random sentences that don't make sense.
Too much search traffic: If you suddenly see way more search traffic than usual, it could be bots.
Strange analytics: If your analytics show a huge number of searches but not that many users, that's another red flag.
Remember: Not every weird search is a bot. Sometimes it's just a search relevance issue or your business is growing. You gotta dig deeper to figure out what's going on.
Bots are always evolving, so there's no one-size-fits-all solution. But here are some ways to minimize their impact on your Algolia search:
Know your search data: Get familiar with your search records and what people usually search for. This will help you spot bot activity.
Check your analytics: Algolia tracks all searches. Keep an eye on your analytics dashboard for anything suspicious.
Rate limiting: You can set limits on how many searches an IP address can do in a certain time.
Robots.txt: Use a robots.txt file to control what search engines can and can't crawl on your site. While we cannot block specific IPs or users from searching in Algolia, generating an API key with a rate limit will set a maximum number of API calls per hour for each IP address. For more information, you can review this Support KB article.
Captcha: Use a captcha service to filter out basic bots.
Bot protection services: There are services like Cloudflare that offer bot protection.
Bots are a pain, but by understanding how they work and taking steps to protect your search, you can keep them from ruining your Algolia experience.
Check out the original video of this presentation on YouTube, or if you have questions about your own Algolia application, please reach out to your customer success contact here or be in touch with Support. Or, if you’re new to Algolia and want more information, schedule a call with our team.
J Choi
Developer Support Engineer