robots.txt and Googlebot: how blocking the wrong bot kills AI Overviews

There are eight Google crawlers. Exactly one controls whether your pages can appear in AI Overviews and AI Mode: Googlebot. The others either do not matter for Search or control adjacent products like Gemini training. If you are trying to opt out of AI Overviews by blocking Google-Extended, you are blocking the wrong bot. If you are blocking Googlebot to "stop AI," you have removed yourself from Search entirely.

What Google says

“robots.txt directives for Googlebot is the control for site owners to manage access to how their sites are crawled for Search.”

Source: AI features in Search (Google)

Why this matters for AI Overviews

Google's crawler matrix is a frequent source of confusion. Here is the only summary you need:

Bot	Controls	Affects AI Overviews / AI Mode
Googlebot	Crawling for Google Search	YES. Blocking it removes you from Search entirely.
Google-Extended	Gemini training and grounding outside Search	NO. Explicitly does not affect Search.
GoogleOther / GoogleProducer / APIs-Google / etc.	R&D and product crawls	NO. Irrelevant to Search appearance.

The most repeated mistake: a team reads about AI and adds Disallow: / for Google-Extended in robots.txt to "block AI." Then a few weeks later they ask why they are not appearing in AI Overviews. They never blocked AI Overviews. They blocked Gemini training, which is a separate product.

Google's own words, in case anyone is still skeptical:

"Google-Extended does not impact a site's inclusion in Google Search nor is it used as a ranking signal in Google Search." Source: Google common crawlers

If you block Googlebot, the page is not in the Google index. If it is not in the index, it cannot appear in AI Overviews. The chain is mechanical, not philosophical.

How to fix it

Pattern A: Maximum visibility (recommended for marketing sites)

User-agent: Googlebot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

Sitemap: https://example.com/sitemap.xml

Everything indexable. AI Overviews fully eligible. Gemini training also allowed.

Pattern B: In Search and AI Overviews, opt out of Gemini training

User-agent: Googlebot
Allow: /

User-agent: Google-Extended
Disallow: /

User-agent: *
Allow: /
Disallow: /admin/

Sitemap: https://example.com/sitemap.xml

This is the right pattern if your stance is "appear in Google Search and AI Overviews, but do not train Gemini on our content." Most B2B marketing sites land here.

Pattern C: Opt out of all third-party AI training, keep Google Search and AI features

User-agent: Googlebot
Allow: /

# Google
User-agent: Google-Extended
Disallow: /

# OpenAI
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /

# Anthropic
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /

# Perplexity
User-agent: PerplexityBot
Disallow: /

# Common Crawl (feeds many AI training datasets)
User-agent: CCBot
Disallow: /

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

This is the strict-opt-out posture: Google Search and AI Overviews still work, but you are opted out of most third-party AI ingestion. You also lose Perplexity and ChatGPT citations as a tradeoff, since they cannot cite what they cannot crawl.

What you should NOT do

User-agent: Googlebot
Disallow: /

This removes you from Search and AI Overviews and every Google product. Almost always a mistake.

Common mistakes when implementing the fix

Blocking Google-Extended thinking it blocks AI Overviews. It does not.
Blocking Googlebot to "protect" content from AI. Removes you from Search entirely.
Using User-agent: Googlebot-AI or similar invented names. Not a real Google user agent. There is no Googlebot variant specifically for AI features.
Leaving Disallow: / from staging in production robots.txt. Identical risk profile to leaving noindex from staging.

Features

Industries

Use Cases

robots.txt, Googlebot, and AI Overviews

What Google says

Why this matters for AI Overviews

How to fix it

Pattern A: Maximum visibility (recommended for marketing sites)

Pattern B: In Search and AI Overviews, opt out of Gemini training

Pattern C: Opt out of all third-party AI training, keep Google Search and AI features

What you should NOT do

Common mistakes when implementing the fix

Related glossary entries

noindex and AI Overviews

Canonical tags and AI Overviews

The Google AI Overview myths SEOs keep getting wrong

Ready to automate your customer service with AI?