HomeArticle

With 39,000 requests per minute, the website was "crushed" by AI crawlers. Meta and OpenAI were named, and developers successively unveiled top - notch anti - crawling "weapons".

CSDN2025-08-22 19:26
This cat-and-mouse game will never end. Web crawlers will always evolve and find ways to bypass various traps.

"My website was crashed by crawlers, and I have to pay for the traffic myself. Meanwhile, others use my content to train AI models and gain a lot of attention."

Since AI robots became popular, many website developers have been complaining. Recently, a report released by cloud - service giant Fastly made people exclaim that "reality is often more cruel than what we only hear."

The report shows that AI crawlers are currently hitting the Internet wildly. They crawl websites at an extremely fast speed, accounting for 80% of the AI robot traffic, and the remaining 20% is on - demand crawling.

For websites without protection, these AI robots are no joke - their peak traffic can even reach 39,000 requests per minute! In other words, an ordinary website may be "bombarded" thousands of times by AI crawlers and scraping programs within a minute, bearing an overload of pressure every second.

The report also reveals that the culprits are well - known mainstream AI giants such as Meta and OpenAI. Therefore, developers are gearing up for a "counter - attack battle."

Are AI crawlers destroying websites? Are Meta, Google, and OpenAI the "masterminds"?

In this report, Fastly divides AI robots into two categories according to their behaviors and uses: Crawlers and Fetchers.

The so - called crawler robots work like search engines. They systematically scan websites, collect content, and use it to build searchable indexes or train language models. This step is the premise of the "training phase" of AI models.

Statistically, crawler robots account for nearly 80% of the AI robot requests, and fetcher robots account for the remaining 20%.

Crawler robots usually crawl content from publicly accessible and authoritative websites, such as news websites, educational resources, government pages, technical documents, or public datasets.

The report shows that almost all AI crawler traffic is divided among several companies: Meta, Google, and OpenAI together account for 95%. Among them, Meta accounts for 52%, Google 23%, and OpenAI 20%.

Fetcher robots are like the "little assistants" of models. When an AI answers a question, they immediately search for relevant web pages or materials so that the model can quote authoritative and up - to - date information to support the answer. That is to say, when the model generates an answer, it not only relies on the content in its memory but also can refer to external data in real - time. This process is called the "inference phase."

Data shows that ChatGPT - User and OpenAI SearchBot together account for 98% of almost all fetch requests. That means OpenAI, mainly through ChatGPT, has the greatest impact on the scraping traffic of websites. Secondly, Perplexity's fetch requests only account for 1.53%, but its influence is gradually increasing.

The report further points out that the top four crawler companies - Meta, Google, OpenAI, and Claude - seem to be particularly interested in commercial websites and always "stare" at this type of content.

Let's take a look at the traffic trend of AI crawlers. Data shows that in recent months, Meta's crawlers have been significantly "accelerating."

At the same time, the frequency of most crawlers is quite random. Sometimes they crawl quietly with normal traffic, which may lead many websites not to realize that their content is being crawled.

However, sometimes the traffic of these AI robots can be excessive. For several days or even weeks, the traffic may directly soar to 2 - 3 times the normal level.

Behind the macro - data, there are actually many real cases. For example, the Ukrainian website Trilegangers, which focuses on human 3D models, as we reported before.

As a website selling 3D scanning data, the seven employees of Trilegangers spent more than a decade building the largest "digital human doubles" database on the Internet. Unexpectedly, at the beginning of this year, this well - running website suddenly crashed. CEO Oleksandr Tomchuk quickly gathered engineers to investigate and found that - even after the website had updated its robots.txt, OpenAI used 600 IPs to scrape data, directly crashing the website.

Tomchuk said that if the crawlers had been more gentle, he might never have noticed this problem. For this reason, Tomchuk publicly scolded: "Their crawler programs are destroying our website! This is basically a DDoS attack."

Invisible costs have to be borne by website administrators and companies themselves

Indeed, if AI robots are not designed reasonably, they will inadvertently bring huge pressure to many website servers, resulting in slow websites, service interruptions, and even increased operating costs. Especially when the traffic of large - scale AI robots soars, it is even more of a headache.

Fastly also shared some real cases in the report:

One crawler's peak can reach 1,000 requests per minute. Although it doesn't sound "excessive," for systems that rely on database queries or those like Gitea that provide Git repository browsing, a short - term peak may cause the website to freeze, time out, or have problems.

The situation of on - demand scraping is even more exaggerated: once, the peak of a fetcher reached 39,000 requests per minute! Even without malicious intent, such traffic can overwhelm the server, consume bandwidth, and even produce an effect similar to a DDoS attack.

Excessive robot traffic not only affects the user experience but also drives up infrastructure costs and distorts website data analysis.

Unfortunately, Arun Kumar, a senior security researcher at Fastly, said in the report that AI robots are changing the way people access and experience the Internet and bringing new complex problems to digital platforms. Whether it is for collecting data to train AI or providing real - time answers, these robots will bring new challenges in terms of visibility, control, and cost. "You can't protect what you can't see. Without clear verification standards, the risks of AI automation will become a blind spot for digital teams."

Developers fight back with homemade traps: Proof - of - Work, ZIP bombs, and mazes

As AI is used more widely and the development of related tools accelerates, AI crawling incidents seem to be increasing rather than decreasing. Facing these "rule - ignoring" crawlers, developers are starting to fight back actively and use various ingenious methods to protect their websites.

Use the "Proof - of - Work" tool Anubis

Xe Iaso, a FOSS developer, publicly scolded Amazon's AI crawler tool in January this year, saying that it was scraping his Git code hosting service wildly, causing the website to crash frequently and making it almost impossible to operate normally.

However, the public outcry had no deterrent effect. Instead, the crawling behavior sometimes became even more intense.

Xe Iaso, who had had enough, developed a system called "Anubis" (https://git.xeserv.us/).

This is an anti - crawling mechanism based on Proof - of - Work. When a user visits a website with Anubis enabled, Anubis will require the browser to complete a SHA - 256 - based PoW challenge.

This challenge requires a certain amount of computing resources. Ordinary users can hardly notice the delay, but for large - scale crawlers, this additional computing overhead will significantly increase the scraping cost, thus playing a restraining role.

Programmers use homemade "ZIP bombs" to fight back against content theft

Some developers have taken more "radical" measures. One day, Ibrahim Diallo accidentally found that a website was stealing his blog content in real - time: whenever someone visited their page, they immediately crawled his latest article, removed the name and brand logo, and pretended to have written it themselves.

At first, Ibrahim tried to "fight back manually" by deliberately feeding the crawler some false data to make them scrape the wrong content. But soon, he thought this method was too troublesome, so he decided to use his secret weapon - "ZIP bombs."

The working principle of this "bomb" is: when a crawler visits his website, he returns a small compressed file that seems normal. The crawler's server will automatically download and try to decompress it. As a result, several gigabytes of "junk" files are released instantly, directly crashing the system.

The captcha for website access becomes a "DOOM challenge." You have to pass three levels to prove you're human

Captchas have also been given a new gamified way. Guillermo Rauch, the CEO of cloud - service platform Vercel, recently launched a DOOM - style captcha combined with AI. Users have to defeat three enemies in "nightmare mode" to prove they are human before they can enter the website.

However, although this method effectively blocks crawlers, it also makes the experience of ordinary users time - consuming and cumbersome.

Network infrastructure companies are also taking action

Large network infrastructure companies are also taking action. Previously, Cloudflare released AI Labyrinth to deal with unauthorized crawlers. When the system detects abnormal behavior, it will lead the crawler into a maze full of fake pages, making the crawler consume resources and get lost. Cloudflare revealed that AI crawlers initiate more than 50 billion requests on its network every day, accounting for nearly 1% of the total traffic.

Conclusion

Through these "anti - crawling mechanisms," AI companies that rely on crawlers to scrape content everywhere will have to pay more. Because the traffic is slowed down and resources are consumed, they have to increase server and hardware investment. Simply put, it makes it more costly and less cost - effective for crawlers to do the same job.

Arun Kumar, a senior security researcher at Fastly, suggested that small websites, especially those with rich dynamic content, can first configure robots.txt to reduce the traffic of well - behaved crawlers. If they have the technical ability, they can also deploy systems like Anubis to further control crawlers.

However, in reality, if these methods are used improperly, they may also harm normal users and reduce the user experience.

As Arun Kumar said, "This cat - and - mouse game will never end. Crawlers will always evolve and find ways to bypass various traps."

This article is from the WeChat official account "CSDN," compiled by Tu Min, and published by 36Kr with authorization.