There are eight Google crawlers. Exactly one controls whether your pages can appear in AI Overviews and AI Mode: Googlebot. The others either do not matter for Search or control adjacent products like Gemini training. If you are trying to opt out of AI Overviews by blocking Google-Extended, you are blocking the wrong bot. If you are blocking Googlebot to "stop AI," you have removed yourself from Search entirely.
What Google says
“robots.txt directives for Googlebot is the control for site owners to manage access to how their sites are crawled for Search.”
Why this matters for AI Overviews
Google's crawler matrix is a frequent source of confusion. Here is the only summary you need:
| Bot | Controls | Affects AI Overviews / AI Mode |
|---|---|---|
| Googlebot | Crawling for Google Search | YES. Blocking it removes you from Search entirely. |
| Google-Extended | Gemini training and grounding outside Search | NO. Explicitly does not affect Search. |
| GoogleOther / GoogleProducer / APIs-Google / etc. | R&D and product crawls | NO. Irrelevant to Search appearance. |
The most repeated mistake: a team reads about AI and adds Disallow: / for Google-Extended in robots.txt to "block AI." Then a few weeks later they ask why they are not appearing in AI Overviews. They never blocked AI Overviews. They blocked Gemini training, which is a separate product.
Google's own words, in case anyone is still skeptical:
"Google-Extended does not impact a site's inclusion in Google Search nor is it used as a ranking signal in Google Search." Source: Google common crawlers
If you block Googlebot, the page is not in the Google index. If it is not in the index, it cannot appear in AI Overviews. The chain is mechanical, not philosophical.
How to fix it
Pattern A: Maximum visibility (recommended for marketing sites)
User-agent: Googlebot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Sitemap: https://example.com/sitemap.xml
Everything indexable. AI Overviews fully eligible. Gemini training also allowed.
Pattern B: In Search and AI Overviews, opt out of Gemini training
User-agent: Googlebot
Allow: /
User-agent: Google-Extended
Disallow: /
User-agent: *
Allow: /
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml
This is the right pattern if your stance is "appear in Google Search and AI Overviews, but do not train Gemini on our content." Most B2B marketing sites land here.
Pattern C: Opt out of all third-party AI training, keep Google Search and AI features
User-agent: Googlebot
Allow: /
# Google
User-agent: Google-Extended
Disallow: /
# OpenAI
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
# Anthropic
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
# Perplexity
User-agent: PerplexityBot
Disallow: /
# Common Crawl (feeds many AI training datasets)
User-agent: CCBot
Disallow: /
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
This is the strict-opt-out posture: Google Search and AI Overviews still work, but you are opted out of most third-party AI ingestion. You also lose Perplexity and ChatGPT citations as a tradeoff, since they cannot cite what they cannot crawl.
What you should NOT do
User-agent: Googlebot
Disallow: /
This removes you from Search and AI Overviews and every Google product. Almost always a mistake.
Common mistakes when implementing the fix
- Blocking Google-Extended thinking it blocks AI Overviews. It does not.
- Blocking Googlebot to "protect" content from AI. Removes you from Search entirely.
- Using
User-agent: Googlebot-AIor similar invented names. Not a real Google user agent. There is no Googlebot variant specifically for AI features. - Leaving
Disallow: /from staging in production robots.txt. Identical risk profile to leavingnoindexfrom staging.