Advice on how to deal with AI bots/scrapers?

zoey@lemmy.librebun.com · 1 day ago

Advice on how to deal with AI bots/scrapers?

doodledup@lemmy.world · 4 hours ago

How would I go about doing that? This seems to be the challenging part. You don’t want false positives and you also want replayability.

Natanael@infosec.pub · 4 hours ago

If you’ve already noticed incoming traffic is weird, you try to look for what distinguishes the sources you don’t want. You write rules looking at the behaviors like user agent, order of requests, IP ranges, etc, and put it in your web server and tells it to check if the incoming request matches the rules as a session starts.

Unless you’re a high value target for them, they won’t put endless resources into making their systems mimic regular clients. They might keep changing IP ranges, but that usually happens ~weekly and you can just check the logs and ban new ranges within minutes. Changing client behavior to blend in is harder at scale - bots simply won’t look for the same things as humans in the same ways, they’re too consistent, even when they try to be random they’re too consistently random.

When enough rules match, you throw in either a redirect or an internal URL rewrite rule for that session to point them to something different.