All articles

// AI crawlers

GPTBot, ClaudeBot and the new robots.txt

The new generation of AI crawlers and how to handle them deliberately.

GEO-friendly team7 min read

Every major AI provider now operates one or more named web crawlers. They all respect robots.txt – but only if you actually mention them. Default 'User-agent: *' rules cover them, but explicit per-bot rules are the cleanest way to control AI access.

The bots that matter right now

  • GPTBot – OpenAI's training and retrieval crawler.
  • ChatGPT-User – fetches pages live when a ChatGPT user asks for them.
  • OAI-SearchBot – powers ChatGPT's search experience.
  • ClaudeBot – Anthropic's training crawler.
  • Claude-Web / claude-user – Anthropic's live retrieval agents.
  • PerplexityBot – Perplexity's index crawler.
  • Perplexity-User – Perplexity's live fetch on behalf of a user.
  • Google-Extended – opt-out token for Google's generative training.
  • Applebot-Extended – same idea for Apple Intelligence.
  • CCBot – Common Crawl, used as a training source by many models.

A sensible default robots.txt

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Being explicit costs nothing and signals intent. If a bot ever changes default behaviour, your file still says what you meant.

When to block

Block when content is licensed, paywalled, or genuinely sensitive. Don't block out of vague unease – invisibility is a worse outcome than citation for almost every business.

Common mistakes

  • Blocking GPTBot because a checklist somewhere said to.
  • Allowing crawl but serving JS-only content the bot can't parse.
  • Forgetting to add new bots as providers launch them.
  • Returning 403 to AI bots from a misconfigured WAF.
Module 03·Robots.txt and AI crawlers

Want the full playbook?

This article is the appetiser. The GEO course covers the same ground in depth – annotated examples, copy-paste templates, real audit walkthroughs, and a 90-day roadmap. Lifetime access, no upsells.

Or just get a heads-up at launch: