Question 1

What is robots.txt?

Accepted Answer

A robots.txt file is a plain-text file at the root of your website (https://example.com/robots.txt) that tells web crawlers which paths they may or may not visit. It is a voluntary protocol — well-behaved crawlers (Google, Bing, OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity) honor it; malicious crawlers ignore it. Robots.txt is not a security mechanism — sensitive content should be protected with authentication, not a Disallow rule.

Question 2

How do I block AI crawlers from my site?

Accepted Answer

Add a user-agent group for each AI crawler with Disallow: /. The major crawlers in 2026 are: GPTBot (OpenAI training), OAI-SearchBot (ChatGPT search), Google-Extended (Google Gemini training), ClaudeBot (Anthropic training), Claude-User and Claude-SearchBot (Claude.ai live answers), PerplexityBot (Perplexity training), Perplexity-User (Perplexity live answers), CCBot (Common Crawl). This generator includes them all as toggleable presets.

Question 3

Should I block AI crawlers?

Accepted Answer

Depends on your priorities. Blocking AI training crawlers protects your content from being used to train models without permission — but it also reduces the chance your brand is mentioned when users ask AI assistants about your category. For most marketing sites, allowing AI crawlers is better: it is free AI visibility. For paywalled content, blocking is wise. The Live answer crawlers (Claude-User, Perplexity-User) are not training crawlers — they fetch your content live when answering a user query, similar to Google's crawler. Blocking those reduces your AI search visibility without preventing training.

Question 4

Where do I upload robots.txt?

Accepted Answer

Robots.txt MUST be served from the root of your domain at /robots.txt (so https://yoursite.com/robots.txt). It cannot be in a subdirectory. Each subdomain needs its own robots.txt. Make sure the file is served with Content-Type: text/plain. Test with: curl -I https://yoursite.com/robots.txt

Robots.txt Generator

What is robots.txt?

AI crawler policy choices in 2026

Where to upload robots.txt