What is robots.txt?
The robots.txt file is a text file placed in your website's root directory that tells web crawlers (bots) which pages they can and cannot access. It's been around since 1994, but with the rise of AI bots, it's more important than ever.
Location:
https://yourwebsite.com/robots.txtWhy Control AI Bots?
You might want to control AI bot access to:
- Protect proprietary or premium content
- Preserve competitive advantages
- Reduce server load (though AI bots are generally well-behaved)
- Comply with legal or regulatory requirements
- Control which sections are used for AI training
Basic robots.txt Structure
A robots.txt file consists of rules that specify which bots can access which content:
User-agent: [bot name] Disallow: [path to block] Allow: [path to allow]
AI Bot User Agents
Here are the main AI bots you can control:
User-agent: GPTBotUser-agent: ClaudeBotUser-agent: Google-ExtendedUser-agent: PerplexityBotCommon robots.txt Configurations
1. Allow All AI Bots (Recommended)
Best for maximum AI discoverability:
User-agent: * Allow: /
2. Block All AI Bots
If you want to prevent AI training on your content:
User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: /
3. Allow Public, Block Private (Smart Approach)
Allow AI bots on marketing pages but protect sensitive areas:
User-agent: * Disallow: /admin/ Disallow: /api/ Disallow: /private/ Disallow: /user/ Allow: /blog/ Allow: /docs/ Allow: /products/
4. Allow Specific Bots Only
Trust certain AI companies but block others:
# Allow GPTBot and ClaudeBot User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / # Block all other bots User-agent: * Disallow: /
5. Rate Limiting
Allow bots but control crawl frequency:
User-agent: * Crawl-delay: 10 # Crawl max 1 page every 10 seconds
⚠️ Note:
Not all bots respect Crawl-delay. GPTBot and ClaudeBot are well-behaved, but this directive is not universally supported.
How to Create & Deploy
Create the File
Use our free robots.txt generator or create a plain text file named robots.txt
Upload to Root Directory
Place it in your website's root directory so it's accessible at:
https://yoursite.com/robots.txtTest It
Visit yoursite.com/robots.txt in a browser to verify it's working.
Best Practices
Keep it simple
Don't over-complicate. Start with basic allow/disallow rules.
Test before deploying
Mistakes can block all bots. Double-check syntax.
Document your decisions
Add comments explaining why you blocked/allowed specific bots.
Review regularly
New AI bots emerge frequently. Update your robots.txt as needed.
Track before blocking
Monitor AI bot activity for a few weeks before making blocking decisions.
Common Mistakes to Avoid
❌ Wrong: Typos in user agent names
Make sure you spell bot names exactly right:
Wrong:
User-agent: GPT-BotCorrect:
User-agent: GPTBot❌ Wrong: Blocking GoogleBot thinking it's AI
GoogleBot (search) is different from Google-Extended (AI). Blocking GoogleBot hurts your SEO!
❌ Wrong: Forgetting the trailing slash
Disallow: /admin blocks /admin but not /administration
Use Disallow: /admin/ to block the directory
Generate Your robots.txt
Use our free tool to create a custom robots.txt for AI bots in seconds