Build a valid robots.txt with allow/disallow rules, a sitemap link, and AI crawler policies — all in one place.
# robots.txt generated at https://insitechat.ai/tools/robots-txt-generator User-agent: * Allow: / Sitemap: https://yoursite.com/sitemap.xml
Upload as robots.txt at the root of your domain (e.g. https://yoursite.com/robots.txt).
A robots.txt file is a plain-text file at the root of your website that tells web crawlers which paths they may or may not visit. It is a voluntary protocol — well-behaved crawlers (Google, Bing, OpenAI, Anthropic) honor it; malicious crawlers ignore it.
Robots.txt is NOT a security mechanism. Listing /admin/ as Disallow only stops well-behaved crawlers from following links there — it does not prevent direct access. Sensitive paths should be protected with authentication, not a robots.txt rule.
The AI crawler landscape splits roughly into two categories:
For most marketing sites: allow everything. Free AI visibility is worth more than the abstract loss of letting AI labs train on public content.
For paywalled content / proprietary IP: block training crawlers (top group), keep live-answer crawlers (bottom group) allowed. Use the “Block training only” preset above.
Robots.txt MUST live at the root of your domain: https://yoursite.com/robots.txt. Subdirectories do not work. Each subdomain needs its own. Verify after upload with:
curl -I https://yoursite.com/robots.txt
Pair this generator with our LLM-Friendly Website Score tool to audit how AI crawlers will see your site overall.