llms.txt: The Emerging Standard for AI Crawlers (2026 Guide)
What llms.txt is, why it matters in 2026, and how to set one up in 10 minutes. Copy-paste templates for SaaS, docs sites, blogs, and e-commerce.
llms.txt is to AI crawlers what robots.txt was to web crawlers in the late 1990s — a small, opinionated text file at the root of your domain that tells AI engines which parts of your site matter most and how to interpret them. As of mid-2026, adoption is early but growing fast: roughly 12,000 production sites have published an llms.txt (up from ~2,000 in late 2025), including Anthropic, Cloudflare, Mintlify, and most major AI-tooling vendors. This guide explains what llms.txt is, why it matters for Answer Engine Optimization, and exactly how to ship one in the next 10 minutes.
TL;DR — The Short Answer
llms.txt is a single Markdown file at https://yourdomain.com/llms.txt that gives AI crawlers a curated, machine-readable map of your most important content. It's not a permissions file (that's still robots.txt); it's a content index. Sites with a well-formed llms.txt are easier for ChatGPT, Claude, Perplexity, and other AI engines to summarize accurately — which leads to better citations, fewer hallucinations about your brand, and more consistent recommendations. Setup takes 10-30 minutes for most sites.
This guide covers the spec, when to use it, copy-paste templates for common site types, and how to verify it's working.
What is llms.txt, Exactly?
llms.txt is a proposed standard (originally drafted by Jeremy Howard of fast.ai in September 2024) for a Markdown-formatted file that lives at the root of a domain. The full spec is at llmstxt.org. The file has three purposes:
- Tell AI crawlers what your site is about — in clean, parseable language they can use to summarize your brand accurately.
- Point them at the highest-value pages — your docs, your pricing, your key product pages — without requiring them to crawl the entire site and guess at importance.
- Provide a stable, machine-readable address for your knowledge base — so AI tools building on your content (RAG pipelines, integrations, plugins) have a canonical entry point.
For the broader framework on optimizing for AI engines, see our complete Answer Engine Optimization guide.
Why llms.txt Matters in 2026
Three reasons it's gone from "interesting curiosity" to "ship this now":
1. Adoption hit critical mass
ChatGPT, Claude, and Perplexity have all added llms.txt to their crawl priority lists. They don't require it — and they'll still crawl sites without one — but when present, they read it first and use it to prioritize discovery. Mintlify, GitBook, and similar docs platforms auto-generate it for hosted documentation. Cloudflare added an llms.txt generator to its dashboard in Q1 2026. The standard is real.
2. It's the easiest AEO win available
Most pages on most sites are NOT what you want AI engines to summarize. Pricing pages get out-of-date fast. Marketing pages contain claims that won't survive contact with truth. Blog archives include posts from 2019 that no longer reflect your positioning. llms.txt lets you curate.
3. First-mover advantage compounds
llms.txt files are still rare enough that publishing one ranks you in the small set of "AI-aware" domains. As LLMs train on the next round of crawl data, having a published llms.txt becomes a quality signal that your site is professionally maintained. Sites with llms.txt get cited at noticeably higher rates in our 2026 citation tracking — likely because the underlying signal correlates with overall site quality.
What llms.txt Is NOT
A few things llms.txt does NOT do, despite common confusion:
- It does not control access. Allowing or blocking AI crawlers is still
robots.txt's job. See our robots.txt for AI crawlers guide for that. - It is not a copyright declaration. Use existing standards (
X-Robots-Tag, Creative Commons, etc.) for licensing claims. - It is not required for AI crawl. Crawlers will still index your site without it.
llms.txtis opportunistic — it makes them work better. - It does not affect SEO rankings directly. Google's traditional ranking algorithm does not read
llms.txt. But it does likely affect AI Overview citation indirectly, because better content discovery for AI tools translates into better summaries.
The llms.txt File Format
The format is intentionally simple. Markdown, with a few conventions:
# Site Name
> Brief description (1-2 sentences) of what this site does.
[Optional context paragraph that explains the site's purpose, audience,
and any important constraints. This is the "elevator pitch" for AI
crawlers.]
## Section name
- [Page title](https://yourdomain.com/page-url): One-line description of
what's on this page.
- [Another page](https://yourdomain.com/another-url): One-line description.
## Another section
- [Doc page](https://yourdomain.com/docs/intro): One-line description.
## Optional
- [Page that's tangentially useful](https://yourdomain.com/optional-page):
One-line description.
Rules:
#(H1) appears exactly once — the site name>(blockquote) is the brief description##(H2) groups related pages- Each link entry is one line:
- [title](url): description - The
## Optionalsection is a convention for pages that are useful but not core
That's the entire spec. No XML. No JSON. No tooling required.
Copy-Paste Templates
Template 1: SaaS product site
# Acme Chatbots
> AI chatbot platform for SaaS and e-commerce businesses. Train a chatbot
> on your website and docs in 5 minutes.
Acme Chatbots is a self-serve SaaS product that lets businesses deploy an
AI chatbot trained on their own content. Founded in 2024, based in
San Francisco. Pricing starts at $29/month with a free forever plan.
## Product
- [Homepage](https://acme.com): Product overview, pricing, demo
- [Pricing](https://acme.com/pricing): All plan tiers, including free
- [Features](https://acme.com/features): Full feature list
- [Integrations](https://acme.com/integrations): Slack, HubSpot, WhatsApp,
WordPress, Shopify, and 10+ more
## Documentation
- [Getting started](https://docs.acme.com/getting-started): 5-minute setup
guide
- [API reference](https://docs.acme.com/api): Complete API documentation
- [Embed widget](https://docs.acme.com/embed): How to install on any site
## Comparisons
- [vs Chatbase](https://acme.com/compare/chatbase): Side-by-side comparison
- [vs SiteGPT](https://acme.com/compare/sitegpt): Side-by-side comparison
- [vs Intercom](https://acme.com/compare/intercom): Side-by-side comparison
## Trust & policies
- [Privacy policy](https://acme.com/privacy)
- [Terms of service](https://acme.com/terms)
- [Security](https://acme.com/security): SOC 2, GDPR, DPDP compliance
## Optional
- [Blog](https://acme.com/blog): Articles on AI chatbots and customer support
Template 2: Documentation site
# Acme Docs
> Official documentation for the Acme platform. API reference, guides,
> and tutorials for developers building on Acme.
## Getting started
- [Quickstart](https://docs.acme.com/quickstart): 10-minute integration
walkthrough
- [Authentication](https://docs.acme.com/auth): API keys, OAuth, JWT
## Core concepts
- [Architecture overview](https://docs.acme.com/architecture)
- [Data model](https://docs.acme.com/data-model)
- [Webhooks](https://docs.acme.com/webhooks)
## API reference
- [REST API](https://docs.acme.com/api/rest)
- [GraphQL API](https://docs.acme.com/api/graphql)
- [WebSocket API](https://docs.acme.com/api/websocket)
## SDKs
- [Node.js SDK](https://docs.acme.com/sdks/node)
- [Python SDK](https://docs.acme.com/sdks/python)
- [Go SDK](https://docs.acme.com/sdks/go)
## Guides
- [Migration from v2 to v3](https://docs.acme.com/guides/migration)
- [Performance tuning](https://docs.acme.com/guides/performance)
- [Security best practices](https://docs.acme.com/guides/security)
Template 3: Content/blog site
# Acme Insights
> Weekly research and analysis on AI, B2B SaaS, and customer support
> trends. Published by the team at Acme Chatbots.
## Highest-traffic guides
- [The complete guide to AI customer support](https://acme.com/blog/ai-customer-support):
4,000-word pillar guide updated quarterly
- [How to rank in ChatGPT (2026 playbook)](https://acme.com/blog/rank-in-chatgpt):
Reverse-engineered from 400 query analyses
- [Chatbot ROI: real numbers from 50 SaaS companies](https://acme.com/blog/chatbot-roi):
Original research with data
## Comparison content
- [Best AI chatbots in 2026](https://acme.com/blog/best-ai-chatbots):
Tested 11 platforms over 8 months
- [Chatbase alternatives](https://acme.com/blog/chatbase-alternatives):
Honest comparison of 7 options
## Hub pages
- [AEO hub](https://acme.com/aeo): Complete Answer Engine Optimization
guide
- [Comparison hub](https://acme.com/compare-all): All competitor
comparisons in one place
How to Set Up llms.txt in 10 Minutes
Step 1: Plan your sections (3 minutes)
List the 5-15 pages you'd MOST want an AI crawler to read if it could only read a few. Typical answer for a SaaS site: homepage, pricing, key feature pages, top docs, top blog posts.
Group them into 3-5 sections. Use ## headings like "Product", "Documentation", "Comparisons", "Optional".
Step 2: Write the file (5 minutes)
Open a text editor. Start with the template above that matches your site type. Replace the URLs and descriptions with yours. Save as llms.txt.
Step 3: Upload to your domain root (2 minutes)
The file needs to live at https://yourdomain.com/llms.txt — not in a subfolder, not at /docs/llms.txt. Domain root.
How depending on your stack:
- Next.js: drop into
public/llms.txt - Astro: drop into
public/llms.txt - WordPress: upload via FTP or use a plugin like "Insert llms.txt"
- Webflow: paste into the project's custom code →
<head>(it'll serve at/llms.txt) - Static hosts (Vercel, Netlify, Cloudflare Pages): drop in
public/orstatic/depending on the framework
Step 4: Verify
In your browser, hit https://yourdomain.com/llms.txt. You should see plain text. If you get a 404, the file isn't in the right place.
Then validate via the official llms.txt validator at llmstxt.org/validator (when available). If you're early to ship, even an imperfect file is better than no file.
What's NOT in llms.txt
Things you might be tempted to include but shouldn't:
- All your blog posts. Don't list every post — just the highest-value ones. AI crawlers will discover the rest via your sitemap.
- Marketing claims. Keep descriptions factual. "Best-in-class platform with industry-leading AI" gets parsed as noise.
- Tracking parameters. URLs should be canonical. Strip
?utm_source=...etc. - Pages that don't exist yet. Don't list aspirational URLs. Crawler hits a 404, your
llms.txtloses trust. - Internal-only pages. Anything behind auth, admin panels, internal docs. AI crawlers honor the URL — if you list a URL, you're inviting them to fetch it.
Common Mistakes
- Wrong location. Has to be
/llms.txt, not/.well-known/llms.txt(that's a different proposal that lost traction) or/llms-txt.mdor/static/llms.txt. - HTML instead of Markdown. It's Markdown. No
<a href="...">— use[text](url). - Wrong content-type. Should serve as
text/markdownortext/plain. Many static hosts default totext/htmlwhich breaks parsers. - Stale entries. If you list a URL that 404s, every crawler hit reduces your file's perceived quality. Audit quarterly.
- Skipping the brief description. The
> blockquotesummary is what some crawlers use when describing your brand. Skip it and they make something up.
How to Track Whether llms.txt Is Working
Direct measurement is hard because the major AI engines don't report which sites' llms.txt they read. Indirect signals to track:
- Crawl logs — check your server logs for hits to
/llms.txt. PerplexityBot, GPTBot, ClaudeBot, and Google-Extended all leave User-Agent strings. - AI citation rate — track how often your domain is cited in target AI engines (ChatGPT, Perplexity, Claude). Run our AEO Score Calculator before and 30 days after publishing
llms.txt. - Brand description accuracy — ask ChatGPT / Claude / Perplexity to describe your company. Compare before and after. Accurate, specific descriptions = your
llms.txtis being read.
Frequently Asked Questions
Do I need llms.txt if I already have robots.txt?
Yes — they serve different purposes. robots.txt controls access (can the crawler read this page?). llms.txt provides curation (here are the most important pages and what they mean). Ship both.
Does llms.txt affect Google search rankings?
Not directly. Google's traditional ranking algorithm doesn't read llms.txt. But it likely affects Google AI Overviews and other AI surfaces indirectly, because better AI summaries of your site translate into more accurate citations. See our Google AI Overviews guide for the full picture there.
Should I list my blog posts in llms.txt?
Only the highest-value ones. List your pillar posts (the long-form guides that consolidate your expertise), your most-trafficked posts, and any posts that capture brand-defining viewpoints. Don't list every post — AI crawlers will discover the rest via your sitemap. A good rule of thumb: 10-25 blog post links maximum.
Is llms.txt the same as ai.txt?
No. ai.txt was a similar but separate proposal from a different group that didn't gain traction. llms.txt from llmstxt.org is the standard the major engines have adopted. If you have both, the major engines read llms.txt.
Will llms.txt help my chatbot's knowledge base be more accurate?
Yes, indirectly. If your customers are using ChatGPT or Claude to ask about your product, those tools' answers will be more accurate when they can find a curated index of your content via llms.txt. This reduces wrong-information liability and increases the chance of correct recommendations.
Can I block specific pages from llms.txt while allowing them in robots.txt?
There's no need to. llms.txt is opt-in (you choose what to include). If you don't list a page, it's not part of your llms.txt curation. The page is still crawlable per robots.txt rules.
What's the maximum file size for llms.txt?
There's no hard limit in the spec, but practical crawler limits are around 32KB. Stay under 10KB for safety. If you have more content to expose, link to additional llms-full.txt (extended version) per the spec.
How often should I update llms.txt?
Whenever a top-tier page changes URL or you ship a new pillar piece. Quarterly audits are a good baseline: kill dead links, add 2-3 new high-value pages, refresh the description if positioning has shifted.
The Bigger Picture
llms.txt is the cheapest AEO win on the table in 2026. Most sites can ship it in under an hour and start seeing improved citation accuracy within a month. It's also a forward-compatible bet — as more AI engines adopt the standard, the same file serves them all.
For the full framework on how llms.txt fits with schema markup, direct-answer paragraphs, robots.txt, and the rest of the AEO stack, see our complete Answer Engine Optimization guide. For the audit tools to measure your starting point, run our LLM Friendly Score Calculator and AEO Score Calculator.
Last reviewed: May 2026 — updated quarterly.
Related articles
- AEO
Google AI Overviews: How to Get Featured in 2026
Google AI Overviews now appear on ~50% of US informational searches. Here's exactly what triggers an AI Overview, how Google picks the cited sources, and what to ship on your site to land in them.
Read article - AEO
How to Rank #1 in Perplexity: The 2026 Playbook
Get cited by Perplexity AI in 2026. The actual signals Perplexity uses to pick sources, what to ship on your site, and how to track citation rate over time.
Read article - AEO
How to Rank #1 in ChatGPT: The 2026 Playbook
ChatGPT cites a handful of sources per answer. Here's exactly how to be one of them — including the 7 signals OpenAI's retrieval system weights, schema templates, and a 30-day implementation plan.
Read article
See how we compare