// AI crawlers

GPTBot, ClaudeBot and the new robots.txt

The new generation of AI crawlers and how to handle them deliberately.

GEO-friendly team•April 2026•7 min read

Every major AI provider now operates one or more named web crawlers. They all respect robots.txt – but only if you actually mention them. Default 'User-agent: *' rules cover them, but explicit per-bot rules are the cleanest way to control AI access.

The bots that matter right now

GPTBot – OpenAI's training and retrieval crawler.
ChatGPT-User – fetches pages live when a ChatGPT user asks for them.
OAI-SearchBot – powers ChatGPT's search experience.
ClaudeBot – Anthropic's training crawler.
Claude-Web / claude-user – Anthropic's live retrieval agents.
PerplexityBot – Perplexity's index crawler.
Perplexity-User – Perplexity's live fetch on behalf of a user.
Google-Extended – opt-out token for Google's generative training.
Applebot-Extended – same idea for Apple Intelligence.
CCBot – Common Crawl, used as a training source by many models.

A sensible default robots.txt

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Being explicit costs nothing and signals intent. If a bot ever changes default behaviour, your file still says what you meant.

When to block

Block when content is licensed, paywalled, or genuinely sensitive. Don't block out of vague unease – invisibility is a worse outcome than citation for almost every business.

Common mistakes

Blocking GPTBot because a checklist somewhere said to.
Allowing crawl but serving JS-only content the bot can't parse.
Forgetting to add new bots as providers launch them.
Returning 403 to AI bots from a misconfigured WAF.

Module 03·Robots.txt and AI crawlers

Want the full playbook?

This article is the appetiser. The GEO course covers the same ground in depth – annotated examples, copy-paste templates, real audit walkthroughs, and a 90-day roadmap. Lifetime access, no upsells.

See the full curriculum Audit your site first

Or just get a heads-up at launch: