Back to Blog
TechnicalDec 28, 2024 • 6 min read

How to Create a robots.txt File for AI Bots

Control which AI bots can access your website. Best practices, examples, and a free generator tool.

What is robots.txt?

The robots.txt file is a text file placed in your website's root directory that tells web crawlers (bots) which pages they can and cannot access. It's been around since 1994, but with the rise of AI bots, it's more important than ever.

Location:

https://yourwebsite.com/robots.txt

Why Control AI Bots?

You might want to control AI bot access to:

  • Protect proprietary or premium content
  • Preserve competitive advantages
  • Reduce server load (though AI bots are generally well-behaved)
  • Comply with legal or regulatory requirements
  • Control which sections are used for AI training

Basic robots.txt Structure

A robots.txt file consists of rules that specify which bots can access which content:

User-agent: [bot name]
Disallow: [path to block]
Allow: [path to allow]

AI Bot User Agents

Here are the main AI bots you can control:

GPTBotOpenAI
User-agent: GPTBot
ClaudeBotAnthropic
User-agent: ClaudeBot
Google-ExtendedGoogle
User-agent: Google-Extended
PerplexityBotPerplexity
User-agent: PerplexityBot

Common robots.txt Configurations

1. Allow All AI Bots (Recommended)

Best for maximum AI discoverability:

User-agent: *
Allow: /

2. Block All AI Bots

If you want to prevent AI training on your content:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

3. Allow Public, Block Private (Smart Approach)

Allow AI bots on marketing pages but protect sensitive areas:

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /private/
Disallow: /user/
Allow: /blog/
Allow: /docs/
Allow: /products/

4. Allow Specific Bots Only

Trust certain AI companies but block others:

# Allow GPTBot and ClaudeBot
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

# Block all other bots
User-agent: *
Disallow: /

5. Rate Limiting

Allow bots but control crawl frequency:

User-agent: *
Crawl-delay: 10
# Crawl max 1 page every 10 seconds

⚠️ Note:

Not all bots respect Crawl-delay. GPTBot and ClaudeBot are well-behaved, but this directive is not universally supported.

How to Create & Deploy

1

Create the File

Use our free robots.txt generator or create a plain text file named robots.txt

2

Upload to Root Directory

Place it in your website's root directory so it's accessible at:

https://yoursite.com/robots.txt
3

Test It

Visit yoursite.com/robots.txt in a browser to verify it's working.

Best Practices

Keep it simple

Don't over-complicate. Start with basic allow/disallow rules.

Test before deploying

Mistakes can block all bots. Double-check syntax.

Document your decisions

Add comments explaining why you blocked/allowed specific bots.

Review regularly

New AI bots emerge frequently. Update your robots.txt as needed.

Track before blocking

Monitor AI bot activity for a few weeks before making blocking decisions.

Common Mistakes to Avoid

❌ Wrong: Typos in user agent names

Make sure you spell bot names exactly right:

Wrong:

User-agent: GPT-Bot

Correct:

User-agent: GPTBot

❌ Wrong: Blocking GoogleBot thinking it's AI

GoogleBot (search) is different from Google-Extended (AI). Blocking GoogleBot hurts your SEO!

❌ Wrong: Forgetting the trailing slash

Disallow: /admin blocks /admin but not /administration

Use Disallow: /admin/ to block the directory

Generate Your robots.txt

Use our free tool to create a custom robots.txt for AI bots in seconds

Related Resources