Ai scraping is an effective DDoS on the entire interent

irelephant [he/him]🍭@lemm.ee · 1 day ago

Ai scraping is an effective DDoS on the entire interent

irelephant [he/him]🍭@lemm.ee · 13 hours ago

Would it be possible to detect the gptbot (or similar) of their user agent, and server them different data?

Can they detect that?

froztbyte@awful.systems · edit-2 12 hours ago

yes, you can match on user agent, and then conditionally serve them other stuff (most webservers are fine with this). nepenthes and iocaine are the current preferred/recommended servers to serve them bot mazes

the thing is that the crawlers will also lie (openai definitely doesn’t publish all its own source IPs, I’ve verified this myself), and will attempt a number of workarounds (like using residential proxies too)

Hirom@beehaw.org · edit-2 6 minutes ago

Generating plausible-looking gibberish require resources. Giving any kind of response to these bots is a waste of resources, even if it’s giberish.

My current approach is to have a robots.txt for bots than honor it. And drop all traffic during 24h for IPs used by bots that ignore robots.txt or misbehave.

irelephant [he/him]🍭@lemm.ee · 12 hours ago

Can they detect that they’re being served different content though?

Ai scraping is an effective DDoS on the entire interent

Ai scraping is an effective DDoS on the entire interent

Excerpt from a message I just posted in a #diaspora team internal f...