Free Tool

Robots.txt Generator

Build a valid robots.txt with allow/disallow rules, a sitemap link, and AI crawler policies — all in one place.

Sitemap URL

Leave blank to omit.

Disallow paths (one per line)

Each path becomes a Disallow line under User-agent: *. Leave blank to default to Allow: /.

Crawl delay (optional, seconds)

Honored by Bing and Yandex. Google ignores Crawl-delay (use Search Console instead).

AI crawler policies
  • GPTBot

    OpenAITrains ChatGPT models

  • OAI-SearchBot

    OpenAIChatGPT live search results

  • ChatGPT-User

    OpenAIChatGPT live-browse on user request

  • Google-Extended

    GoogleTrains Gemini and Vertex AI

  • ClaudeBot

    AnthropicTrains Claude models

  • Claude-User

    AnthropicClaude.ai live-browse on user request

  • Claude-SearchBot

    AnthropicClaude.ai live search results

  • PerplexityBot

    PerplexityTrains Perplexity

  • Perplexity-User

    PerplexityPerplexity live-browse on user request

  • CCBot

    Common CrawlPublic web corpus used by many AI labs

# robots.txt generated at https://insitechat.ai/tools/robots-txt-generator

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Upload as robots.txt at the root of your domain (e.g. https://yoursite.com/robots.txt).

What is robots.txt?

A robots.txt file is a plain-text file at the root of your website that tells web crawlers which paths they may or may not visit. It is a voluntary protocol — well-behaved crawlers (Google, Bing, OpenAI, Anthropic) honor it; malicious crawlers ignore it.

Robots.txt is NOT a security mechanism. Listing /admin/ as Disallow only stops well-behaved crawlers from following links there — it does not prevent direct access. Sensitive paths should be protected with authentication, not a robots.txt rule.

AI crawler policy choices in 2026

The AI crawler landscape splits roughly into two categories:

  • Training crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot) — fetch your pages to train future model versions. Blocking them protects your content from being used in training, but reduces the chance your brand is in the next model's knowledge.
  • Live-answer crawlers (ChatGPT-User, OAI-SearchBot, Claude-User, Claude-SearchBot, Perplexity-User) — fetch your pages at runtime when a user asks the AI a relevant question. Blocking these directly reduces your AI search visibility.

For most marketing sites: allow everything. Free AI visibility is worth more than the abstract loss of letting AI labs train on public content.

For paywalled content / proprietary IP: block training crawlers (top group), keep live-answer crawlers (bottom group) allowed. Use the “Block training only” preset above.

Where to upload robots.txt

Robots.txt MUST live at the root of your domain: https://yoursite.com/robots.txt. Subdirectories do not work. Each subdomain needs its own. Verify after upload with:

curl -I https://yoursite.com/robots.txt

Pair this generator with our LLM-Friendly Website Score tool to audit how AI crawlers will see your site overall.