66 billion bot requests analysis: AI bots rise, SEO tools shrink, and search engines hold their ground

66 billion bot requests analysis: AI bots rise, SEO tools shrink, and search engines hold their ground

For website owners, attracting visitors and turning them into clients has always been the main goal – and challenge. But today, it’s not only about getting to the top of search results. With hundreds of millions of people using AI tools, it’s also about getting on the AI radar.

Our analysis of 66.7 billion web crawlers (also called bots or spiders) across 5+ million websites draws a new picture of the web, and one pattern stands out:

AI-driven bots – especially those powering assistants like ChatGPT, Siri, TikTok Search, and Petal Search – are steadily increasing their reach across the web. The role of AI in web discovery is becoming more “search-like”.

Even when the total number of AI-driven bot requests decreases, the share of websites they crawl keeps growing. On the other hand, LLM training bots like OpenAI’s GPTBot and Meta’s ExternalAgent show the opposite trend: fewer sites let them in, resulting in steep drops in coverage despite their heavy overall activity.

Traditional search bots remain stable and predictable. SEO and monitoring crawlers slowly shrink. Social and ad-related bots fluctuate but maintain modest, consistent coverage.

Let’s dive deep into the numbers to better understand who is really crawling the internet, how their behavior is changing, and what this means for you in 2026.

Understanding the new crawling landscape

Web crawlers are automated programs that discover and index information. Some do this to understand what’s on your website, others look for info to answer user questions or collect data for AI model training.

We analyzed the user-agent strings that bots send when they visit a site. We filtered out traffic that’s most likely human so the analysis focuses only on automated systems. They make up around 30% of global web traffic according to Cloudflare Radar, and our data confirms this. 

The bubble chart below shows each bot’s total request volume against the percentage of websites it visits.

This instantly shows how differently bots behave: some crawl a handful of sites deeply, while others appear almost everywhere but only touch the surface.

The chart also highlights a few broad patterns:

  • Vaguely defined scripts and bots cover the vast majority of websites
  • Search engines remain the widest crawlers
  • AI-related bots are expanding their footprint
  • Many smaller, niche crawlers focus more on depth than breadth

We grouped the bots we could identify into six major categories based on their stated purpose, and used AI.txt project’s classifications to identify the AI-related bots. 

Request volume indicates activity; website coverage indicates influence. The analysis below focuses on reach – the percentage of sites each bot accesses – as a more revealing data point.

Group 1: Scripts, empty, and generic bots (mostly non-AI)

23B requests (34.6% of total)

Bots in this group are a combination of scripts (using keywords like python, curl, wget, etc.), empty user-agent strings, and generic bots (keywords: spider, crawler, bot, etc.). They often come from automation tools, plugins, or monitoring scripts that reuse generic browser identities. Some may even collect data at scale, but without clear labeling, it’s impossible to know whether they support AI training or just routine background tasks.

  • Scripts – 92.33% coverage, 7.7B requests
  • Empty strings – 51.67% coverage, 12.2B requests
  • Generic bots – 48.67% coverage, 3B requests

Nearly every site receives traffic from these vaguely identified sources, but these are not deliberate, meaningful crawlers like AI or search engines. Traffic volumes fluctuate, but overall coverage remains stable.

Group 2: Classic search engine bots (mostly non-AI)

20.3B requests (30.5% of total)

These crawlers index the web for traditional search engines such as Google, Bing, or Baidu. They may indirectly feed AI systems, but it’s not their primary function.

  • google-bot72% average coverage, 14.7B requests
  • bing-bot57.67% coverage, 4.6B requests
  • yandex-bot19.33% coverage, 621M requests
  • duckduck-bot9% coverage, 42M requests
  • baidu-bot5.67% coverage, 166M requests
  • sogou-bot4.33% coverage, 68M requests

Despite AI dominating the narrative, classic search engines continue to scan large portions of the web. Google’s main bot in particular expanded its reach significantly, while others hold their ground. Baidu’s sharp November spike represents either expanded global indexing or a temporary crawl burst – the pattern will clarify in the coming months.

10.1B requests (15.1% of total)

This group includes the bots explicitly tied to large language model (LLM) training, dataset building, or internal research.

  • meta-externalagent57.33% average coverage, 4B requests
  • openai-gptbot55.67% coverage, 1.7B requests
  • google-other9.67% coverage, 2.9B requests
  • claude-bot9.33% coverage, 1.4B requests
  • perplexity-bot1.67% coverage, 13M requests
  • commoncrawl-bot1% coverage, 30M requests

This group shows the strongest declines, largely due to websites blocking AI-training crawlers. GPTBot’s crash from 84% to 12% is the clearest signal of this trend. The only exception is google-other, likely due to Google’s expanding internal AI research.

6.4B requests (9.7% of total)

These bots primarily support SEO analytics, uptime monitoring, content audits, and competitive intelligence. Some of them now feed AI marketing and content-generation systems.

  • ahrefs-bot60% average coverage, 3.1B requests
  • majestic-bot27.7% coverage, 1.1B requests
  • semrush-bot 25% coverage, 1.1B requests
  • alibaba-bot4.67% coverage, 162M requests
  • dataprovider3.67% coverage, 125M requests
  • dotbot-bot3% coverage, 294M requests
  • uptimerobot-bot1% coverage, 253M requests
  • ahrefs-audit0% coverage, 228M requests

Declining coverage reflects two trends: these tools increasingly focus on actively optimized sites (where SEO matters most), and website owners are blocking resource-intensive crawlers.

4.6B requests (6.9% of total)

These bots fetch content on demand to answer specific user queries in AI assistants and search tools. Unlike training bots, they serve users directly rather than building datasets, which may explain their expanding access.

  • openai-searchbot55.67% average coverage, 279M requests
  • tiktok-bot25.67% coverage, 1.4B requests
  • apple-bot24.33% coverage, 1.3B requests
  • petalsearch-bot18.33% coverage, 675M requests
  • openai-chatgpt9.33% coverage, 137M requests
  • amazon-bot4.67% coverage, 581M requests
  • google-readaloud4.33% coverage, 225M requests

Bots powering ChatGPT, TikTok, Siri, Petal, and other AI search tools and assistants are rapidly transitioning into major web discovery players. The biggest growth signals belong to OpenAI, Apple, and TikTok. These crawls are user-triggered and more targeted, reflecting the new paradigm where AI-driven discovery competes directly with classic search.

2.2B requests (3.3% of total)

This category of bots fetches metadata for link previews, ads, social posts, and messaging content. Large platforms repurpose some of this data internally.

  • meta-fbexternalhit69% average coverage, 1.3B requests
  • google-chromeprivacy18% coverage, 66M requests
  • google-adsbot9.33% coverage, 239M requests
  • mobile-whatsapp5% coverage, 58M requests
  • mobile-iMessage5% coverage, 26M requests
  • pinterest-bot4% coverage, 177M requests
  • google-adsense2.33% coverage, 273M requests
  • google-adstxt2% coverage, 15M requests
  • google-feedburner1% coverage, 30M requests

Social and ad bots are generally stable, but Meta’s link preview crawler is losing coverage – possibly due to explicit blocking or reduced use of Facebook’s sharing pipeline.

Key insight

Across all 66.7 billion records, one message stands out: AI crawlers are rapidly increasing their reach, even as AI training bots face growing resistance from content creators. Some of the most active AI-related bots now access over half of all monitored websites, rotating targets and building a near-complete picture of the web in a matter of weeks.

As AI search tools and assistants evolve into direct competitors to classic search engines, website owners face a strategic choice:

  • Publishers and content sites may want visibility in AI assistant responses (via tools like Web2Agent and llms.txt files) since these increasingly compete with Google for traffic.
  • Sites with proprietary content or APIs may block training bots to prevent commercial use of their data while allowing assistant bots that drive traffic.
  • High-traffic sites concerned about server load can use CDN AI Audit to selectively block resource-intensive crawlers.

The middle path – allowing assistant bots while blocking training bots – appears to be the emerging standard.

Methodology

We analyzed 66.7 billion anonymized log entries from 5 million websites hosted with us, covering three 6-day windows: June 13–18, August 20–25, and November 20–25 (all dates inclusive). Bot grouping is based on publicly documented user-agent descriptions, classifications, and observed crawling behavior. Only verified bot traffic was included; human visitors and noise unrelated to crawling were excluded.

Author
The author

Gediminas G

Gediminas is a communications specialist passionate about technologies and their possibilities. His main responsibility is to help people understand Hostinger products and their features. He likes spending his free time bathing in the hot tub, grilling, playing poker, fishing, and other activities.