Cloudflare Offers Simpler Way To Stop Ai Bots

Серверы с серверами

Unchecked AI bots scraping content for their training could spell the end of the open web if enterprises follow one analyst's advice to put their intellectual property behind a paywall.

Credit: Michael Rivera

Content distribution network Cloudflare is making it simpler for customers who have had enough of badly behaved bots to block them from their website.

It's long been possible to prevent well-behaved bots from crawling your corporate website by adding a "robots.txt" file listing who's welcome and who isn't - and content distribution networks such as Cloudflare offer visual interfaces to simplify the creation of such files.

But faced with the arrival of a new generation of badly behaved AI bots, scraping content to feed their large language models (LLMs), Cloudflare has introduced an even quicker way to block all such bots with one click.

"The popularity of generative AI has made the demand for content used to train models or run inference on skyrocket, and although some AI companies clearly identify their web scraping bots, not all AI companies are being transparent," Cloudflare staff wrote in a blog post.

According to authors of the post, "Google reportedly paid$60 million a year to license Reddit's user generated content, Scarlett Johansson alleged OpenAI used her voice for their new personal assistant without her consent, and most recently, Perplexity has been accused of impersonating legitimate visitors in order to scrape content from websites. The value of original content in bulk has never been higher."

Last year, Cloudflare introduced a way for any of its customers, on any plan, to block specific categories of bots, including certain AI crawlers. These bots, said Cloudflare, observe requests in sites' robots.txt files, and do not use unlicensed content to train their models, nor gather to feed for retrieval-augmented generation (RAG) applications.

To do this it identifies bots by their "user-agent string" - a kind of calling card presented by browsers, bots and other tools requesting data from a web server.

"Even though these AI bots follow the rules, Cloudflare customers overwhelmingly opt to block them. We hear clearly that customers do not want AI bots visiting their websites, and especially those that do so dishonestly," the post said.

The top four AI webcrawlers visiting sites protected by Cloudflare were Bytespider, Amazonbot, ClaudeBot and GPTBot, it said. Bytespider, the most frequent visitor, is operated by ByteDance, the Chinese company that owns TikTok. It visited 40.4% of protected websites, and is reportedly used to gather training data for its LLMs, including those that support its ChatGPT rival Doubao.Amazonbotis reportedly used to index content to help Amazon's Alexa's chatbot answer questions, while ClaudeBot gathers data for Anthropic's AI assistant Claude.

Blocking bad bots

Blocking bots based on their user-agent string will only work if such bots tell the truth about their identity - but there are signs that not all do, or not all the time.

In such cases, other measures will be necessary - and enterprises' main recourse against unwanted web scraping is normally reactive: pursue legal action, according to Thomas Randall, director of AI market research at Info-Tech Research Group.

"While some software applications exist for web scraping prevention (such as DataDome and Cloudflare), these can only go so far: if an AI bot is rarely scraping a site, the bot may still go undetected," he said via email.

To justify legal action against the operators of bad bots, enterprises will need to do more than claim that the bot didn't leave when asked.

The best course of action, Randall said, is for "enterprises to hide intellectual property or other important information behind a membership paywall. Any scraping done behind the paywall is liable for legal action, reinforced with a clear restrictive copyright license on the site. The organization must, therefore, be prepared to legally follow through. Any scraping done on the public site is accepted as part of the organization's risk tolerance."

Randall noted that if organizations have the resources to go further, they could consider rate-limiting connections to their site, temporarily automatically blocking suspicious IP addresses, limiting information on why access has been blocked to a message such as "For help, contact support via [email protected]" in order to force a human interaction, and double-checking how much of their websites are available on their mobile site and apps.

"Ultimately, scraping cannot be stopped, but hindered at best," he said.

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

Серверы с серверами

Новости по теме

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Cloudflare offers simpler way to stop AI bots

Unchecked AI bots scraping content for their training could spell the end of the open web if enterprises follow one analyst's advice to put their intellectual property behind a paywall.

Blocking bad bots

Горячие метки: Общий анализ искусственного интеллекта Web Search

Ordering Guide

Ресурсы по программам

О нас

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

Серверы с серверами

Новости по теме

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Cloudflare offers simpler way to stop AI bots

Unchecked AI bots scraping content for their training could spell the end of the open web if enterprises follow one analyst's advice to put their intellectual property behind a paywall.

Blocking bad bots

Горячие метки: Общий анализ искусственного интеллекта Web Search

Ordering Guide

Ресурсы по программам

О нас

Introduction to Huawei CloudEngine S6730-H Series Switches