← Blog

llms.txt: What It Actually Does, Who Uses It, and Whether You Need It

An honest guide to the llms.txt standard. What the spec requires, which companies publish one, whether any major LLM provider actually reads it, and a decision tree for whether your site needs one.

Max Tsygankov

Max Tsygankov · Founder, Crawloria

Published May 6, 2026 · 11 min read


A markdown file at the root of your domain that lists your most important pages with one-sentence descriptions, formatted so an LLM can read it as a curated table of contents. That is llms.txt. The proposal is two years old as of this writing. It has 1,655 published implementations and zero confirmed major-provider crawler reading it. Both of those facts are true and both matter.

Most articles on llms.txt either oversell it ("the new SEO standard for AI") or dismiss it ("nobody uses it, skip"). Neither framing matches reality. The honest version is that llms.txt is a low-cost convention that some agents do consume, that more agents probably will, and that has effectively zero downside as long as you publish a real file rather than a marketing artifact. This guide walks through what the spec requires, who has actually adopted it, what evidence we have about which agents read it, and a decision tree for whether your site is one of the cases where it matters.

What is llms.txt actually?

llms.txt is a plain-text markdown file you serve from https://your-domain.com/llms.txt. It contains a curated list of links to your important pages — written in markdown — so an LLM that retrieves it can navigate your site without crawling every page or guessing what's important. Done well, the file fits inside an LLM's context window as a compact map of high-signal pages, instead of forcing the model to spend tokens parsing your full HTML.

The spec at llmstxt.org was proposed by Jeremy Howard, co-founder of Answer.AI and fast.ai, on September 3, 2024.1 His framing in the original proposal was practical: "Our expectation is that llms.txt will mainly be useful for inference, i.e. at the time a user is seeking assistance, as opposed to for training." The file is not a training-data signal. It is a runtime hint for agents that need to find their way around a site they already chose to read.

The spec mandates this structure, in order:

  1. An H1 heading with the project or site name. Required.
  2. A blockquote with a brief summary. Optional but expected.
  3. Zero or more markdown sections with detailed project information, with no headings. Optional.
  4. Zero or more H2-delimited sections containing file lists — markdown bullet links in [name](url): description form. Optional but the main payload.

Only the H1 is technically required. Everything else is convention. In practice, a useful llms.txt has all four, structured so an LLM treating the file as a table of contents can walk the H2 sections like chapters.

A second variant, llms-full.txt, is widely used in practice though not formalized in the original spec the way llms.txt is. The convention there is: same index as llms.txt, plus content excerpts under each link, so an LLM can answer questions about the site without making additional HTTP requests. The official llms_txt2ctx tool generates a llms-ctx.txt and llms-ctx-full.txt from a base llms.txt, but most sites that publish both use the simpler llms-full.txt filename. We default to the latter in our free llms.txt generator.

Who actually publishes one today?

The directory at directory.llmstxt.cloud lists 1,655 entries split across websites (849), products (447), developer tools (358), AI companies (187), and finance companies (167) as of May 2026. The header on the directory describes itself as "a directory of websites and companies leading the adoption of the llms.txt standard."

Notable adopters with substantial files:

  • Anthropic# Anthropic Developer Documentation at platform.claude.com/docs/llms.txt, ~900 tokens, points to the full developer reference. (Originally hosted at docs.anthropic.com, redirected to platform.claude.com in early 2026.)
  • Perplexity — ~4,000 tokens of curated documentation.
  • Cloudflare — ~49,000 tokens, one of the largest published llms.txt files. Cloudflare is also the only major infrastructure provider with a documented llms.txt fetch behavior in their AutoRAG retrieval product, which directly consumes llms.txt files when present.
  • Vercel AI SDK — ~293,000 tokens spanning the entire AI SDK reference.
  • Hugging Face — multi-project, including Transformers at ~813,000 tokens.
  • ElevenLabs, Coinbase, Zapier — all publish llms.txt files for their public docs or product pages.

The adoption pattern is concentrated in two segments: developer-tooling companies whose primary audience is engineers using LLMs (Vercel, Hugging Face, Anthropic, ElevenLabs), and infrastructure providers building agentic products themselves (Cloudflare, Coinbase). Most of these adopters point their llms.txt at clean API reference documentation rather than marketing pages — the goal is machine-readable content the LLM can ground answers in, not brand collateral. DTC e-commerce, B2B SaaS marketing sites, and content publishers have been slower adopters. That gap is part of the opportunity if you operate in those segments — most of your competitors don't have one yet.

Does any LLM provider actually read llms.txt?

The honest answer: no major LLM provider has publicly confirmed their crawlers consume llms.txt. Not OpenAI, not Anthropic, not Google, not Perplexity. This is the single most repeated fact in honest coverage of the standard.2

But "no public confirmation from major providers" is not the same as "nobody reads it." Here's what we can measure or directly observe:

  • Cursor and Continue (developer-focused AI IDEs) explicitly support pasting an llms.txt URL as a documentation reference. The user does the retrieval, but the file is consumed.
  • Cloudflare AutoRAG uses llms.txt where present as a hint for retrieval-augmented generation pipelines hosted on Cloudflare Workers. This is documented in Cloudflare's developer docs.
  • Mintlify, the documentation platform, auto-generates llms.txt for hosted sites and explicitly markets the feature as "exposing your docs to LLMs."
  • Custom retrieval pipelines built on LangChain, LlamaIndex, and similar frameworks frequently include llms.txt as a fallback retrieval target. This is not a single product feature but a common pattern.

What we don't have evidence for: ChatGPT Search, Claude's web access, Perplexity's index, or Google AI Overviews preferentially using llms.txt. Whatever they do with it (if anything) is opaque.

The reasonable interpretation is that llms.txt is currently a developer-tooling convention with growing crawler-side adoption that hasn't been publicly announced. It is structurally similar to robots.txt in 1994: nobody had to formally adopt it for the convention to spread, and major search engines started honoring it without formal announcements. Whether llms.txt follows that arc or fades is unknown.

How is llms.txt different from robots.txt and sitemap.xml?

This confusion is the single most common source of bad llms.txt files. The three files solve different problems and the differences matter.

File Purpose Format Audience What it says
robots.txt Crawl permission gate Plain-text directives All crawlers "Bot X is allowed/disallowed on path Y"
sitemap.xml Complete URL inventory XML All search crawlers "Here is every URL we want indexed"
llms.txt Curated content map Markdown Retrieval-mode LLMs "Here are the most important pages and what each is about"

The mistake most sites make is treating llms.txt as a second sitemap — dumping every URL into it. This is wrong. Sitemap.xml is the inventory; llms.txt is the table of contents. A useful llms.txt lists 5-25 pages, each with a clear title and one-sentence description that tells an LLM what it answers. An llms.txt with 5,000 entries is noise.

The second mistake is duplicating robots.txt syntax. llms.txt has no Disallow: or User-agent: directives. It's pure markdown content. If you want to control which crawlers read what, that goes in robots.txt — llms.txt is content, not access control.

The format trade-off is worth naming. robots.txt and sitemap.xml are machine-readable by syntax — specific keys, fixed grammar. llms.txt is machine-readable by convention — the markdown structure does the work. Both approaches are valid, but sitemap.xml carries no metadata signals about which page matters more or what each one answers. That's exactly what llms.txt adds: human-curated metadata about importance and topic, in a format an LLM can parse directly.

What does a valid llms.txt look like?

Here is the minimum viable file for a Shopify cookware brand:

# Made-for-Cooks Pans

> Carbon-steel cookware made in France for home cooks who want
> restaurant-grade pans that develop a natural seasoning over time.

We are a direct-to-consumer brand operating since 2018, shipping
to 47 countries. All pans are made in our partner foundry in
Vienne and seasoned by hand before shipping.

## Product collections

- [Carbon-steel skillets](https://example.com/skillets): The full skillet range, 8" to 14", with grain-flow forging.
- [Sauce pots](https://example.com/saucepots): Tin-lined copper sauce pots in three sizes.
- [Cookware sets](https://example.com/sets): Curated 5-piece and 9-piece sets at a discount.

## Care and seasoning

- [How to season carbon steel](https://example.com/guides/seasoning): Six-step seasoning process for new pans, with photos.
- [Care after every use](https://example.com/guides/daily-care): Wash, dry, and oil pattern that preserves seasoning.

## Company

- [Our story](https://example.com/about): How we partnered with a 1920s French foundry to make cookware in small batches.
- [Shipping and returns](https://example.com/shipping): International shipping rates, customs handling, and our 30-day return policy.
- [Contact](https://example.com/contact): Email, phone hours, and the fastest path to a human.

That file has eight links across three sections. It tells an LLM exactly what kind of business this is, what the product range is, and where to find the specific information a shopping agent or research user would want. It's about 200 words. Most useful llms.txt files are between 100 and 1,000 words.

Should you publish llms.txt? A decision tree

Here is the honest cost-benefit. Publishing a useful llms.txt is a content discoverability investment — small but real, and most useful for sites where retrieval-mode LLMs are the highest-value reader. It costs ten minutes if your site already has clear navigation, or one to two hours if you need to think through the structure. The benefit is uncertain — somewhere between "marginal SEO signal" and "real lift in agent retrieval quality" depending on which agents read your category. The risk is near-zero as long as the file is real, current, and not marketing-puffed.

Your site type Publish llms.txt? Why
Documentation site (API, SDK, product docs) Yes Retrieval-mode LLMs are your highest-value traffic. Adoption is highest in this category.
B2B SaaS marketing site with clear product pages Yes Sales-relevant agents are the closest to a measurable conversion path.
DTC e-commerce with curated collections Yes AI shopping agents (ChatGPT Operator, Comet) increasingly browse merchant sites directly.
Content publisher with topic pillars Probably If your topic structure maps cleanly to H2 sections — each section pointing at a pillar page with related spokes underneath. Skip if your archive is sprawling and uncurated.
Single-page landing site No A homepage already tells an LLM everything. llms.txt would just duplicate.
Heavily interactive product (web app behind login) No The value lives behind authentication; llms.txt can't help.
Aggregator with thousands of similar entries No Hard to curate without it becoming a worse sitemap. Consider a category-level llms.txt instead.

The decisive question for the borderline cases is: can you write a one-sentence description for each link that an LLM could use to answer "where on this site would I find X?" If yes, your site benefits from an llms.txt. If you'd be writing filler descriptions just to fill the file, you'll publish a worse signal than nothing.

A specific anti-pattern: an outdated llms.txt is worse than no llms.txt. If your file lists discontinued products, old pricing, or rebranded company names, agents that read it confidently surface stale information attributed to your domain. That hurts you more than it would help if the file were merely missing. Treat llms.txt as part of your published content lifecycle — update it on the same cadence as your homepage and product pages.

How to generate llms.txt for your site

There are four practical paths, ranked by effort:

  1. Hand-write it from your sitemap. If your site has 5-30 pages worth listing, this is fastest. Start with the H1 and summary, then group the most important URLs into 2-4 H2 sections. Aim for descriptions that answer "what question does this page answer?"

  2. Use the free Crawloria llms.txt generator. Paste your homepage URL. The generator fetches your sitemap.xml, reads up to 24 of your most important pages, extracts titles and meta descriptions, and assembles both llms.txt and llms-full.txt. Output is ready to copy or download. No signup, no email gate.

  3. Use a CMS plugin. Yoast SEO (WordPress) added one-click llms.txt generation in early 2026. Mintlify auto-generates it for hosted documentation sites. Several Shopify apps generate one for product collections. These work but are vendor-specific and tend to dump too many URLs.

  4. Pipeline-generate from your docs. For documentation sites, scripts that walk your docs tree and generate llms.txt in CI are a clean solution. Anthropic, Vercel, and Hugging Face all do this — typically tied to versioning, so each docs release ships with a current llms.txt. The advantage is the file stays current automatically.

Whichever path you pick, validate the output. The most common errors are: missing H1, missing blockquote summary (some agents skip files that don't have one), too many entries (>50 dilutes signal), and stale or marketing-puffed descriptions ("industry-leading platform" rather than "guides on how to season carbon-steel cookware").

How to test if llms.txt is working

Once published, two checks tell you if it's reachable and well-formed:

  1. Direct fetch. From a clean curl or browser, request https://your-domain.com/llms.txt and confirm HTTP 200 with content-type text/plain or text/markdown. Files served with text/html are technically valid markdown but some retrieval pipelines reject them. Files behind a Cloudflare challenge page never reach the agent.

  2. Crawloria audit. Run a free Crawloria audit on your homepage. Our llms.txt check parses your file, validates the H1, blockquote summary, and section structure, and flags common issues (missing summary, no sections, served as HTML instead of plain text). The same audit also checks the seven AI bot crawlers can reach your site, which is a prerequisite for any of this to matter.

Beyond the structural check, there is no public way to verify that a specific LLM consumes your llms.txt. ChatGPT, Claude, and Perplexity don't expose retrieval logs to site owners. The closest signal is whether the agent can answer accurate, specific questions about your site after publishing the file — and that signal is noisy because the agents may already have your homepage in their training data.

FAQ

Is llms.txt the same as robots.txt for AI?

No. robots.txt is access control — it tells crawlers what they can and cannot fetch. llms.txt is curated content — it tells crawlers what to read first. The two work together: robots.txt allows GPTBot and ClaudeBot, llms.txt then guides those bots to your most important pages.

Do I need llms.txt to rank in ChatGPT?

Not directly. ChatGPT Search ranking depends on traditional authority signals — backlinks, domain age, content depth, structured data — and OpenAI has not confirmed they read llms.txt. But llms.txt removes one specific failure mode: ChatGPT correctly understanding what your site is about. If your homepage is sparse but your blog and pricing pages are rich, an llms.txt that points there improves the agent's read.

What's the difference between llms.txt and llms-full.txt?

llms.txt is the index — links with descriptions. llms-full.txt is the same index plus content excerpts inline, so an agent doesn't need to make additional HTTP requests to read each linked page. Most adopters publish both. Our generator produces both by default.

Should I list every page in llms.txt?

No. The single most common mistake is treating llms.txt as a second sitemap. Limit to 5-25 entries — your most important pages with clear, useful descriptions. A larger sitemap.xml handles complete URL inventory. llms.txt is the table of contents.

Where do I put the llms.txt file?

At the root of your domain: https://your-domain.com/llms.txt. Some sites also publish at /.well-known/llms.txt as a fallback. Crawloria checks both locations during an audit. Don't put it in a subdirectory or behind authentication.

Does llms.txt help with Google AI Overviews?

Probably not directly. Google has not confirmed their AI Overviews retrieval reads llms.txt. AI Overviews currently lean on Google's existing search index plus Schema.org structured data. If you're optimizing for AI Overviews specifically, structured data is a higher-leverage investment than llms.txt today.

How often should I update llms.txt?

Treat it like a small homepage. Update when: products launch or get discontinued, you rebrand or reposition, your top docs URLs change, or your site structure reorganizes. A stale llms.txt is worse than a missing one because it confidently sends agents to dead or wrong pages.

Is there a llms.txt validator?

Yes — Crawloria's free audit validates the file during a normal site scan. The validator at llms-txt.org provides a JSON-schema-style check for structural correctness. Several open-source CLI tools also exist on GitHub.

Is llms.txt the same as Generative Engine Optimization (GEO)?

No. GEO is a positioning term for the broader practice of optimizing for AI-driven search and answer engines. llms.txt is one tactic within GEO, not the discipline itself. GEO also includes structured data, content depth, and citability. llms.txt sits alongside those, not above them.

Does llms.txt improve citability in AI answers?

Possibly. The hypothesis is that an LLM with access to clear, curated source descriptions cites the matching page more reliably than one parsing your full HTML for context. There is no public benchmark on this, but it is the strongest theoretical argument for publishing one — especially for sites whose value depends on being cited correctly.

What's next

If you're working through AI agent readiness for your site, llms.txt is one of seven signals that matter. The others — bot access, content rendering, structured data, navigation friction, semantic markup, and real-time search visibility — are all measured in a Crawloria audit, with prioritized fixes for each.

For the bot-access prerequisite ("Is GPTBot even allowed to fetch your site?"), see ChatGPT Not Showing Your Website? 9 Causes and How to Fix Each. For the rendering question ("Can a vision-based agent actually read my homepage?"), see How AI Agents See Your Website: The 1568-Pixel Rule. For commerce sites specifically, Shopify ChatGPT Integration: 5 Walls Blocking AI Agent Checkout covers the agent-driven checkout path that depends on all of the above.

Generate your llms.txt with the free Crawloria generator and validate the result with a free Crawloria audit. Both take under two minutes, and together they cover the structural prerequisites for any agent retrieving your site to do it correctly.

Footnotes

  1. Jeremy Howard, "The /llms.txt file" — proposal published at https://llmstxt.org/ on September 3, 2024.

  2. Independent coverage by Ahrefs ("What Is llms.txt, and Should You Care About It?") and LinkBuildingHQ ("Should Websites Implement llms.txt in 2026?") both confirm as of early 2026 that no major LLM provider has publicly announced llms.txt as a documented input to their crawlers. Adoption signals come from documentation publishers (Mintlify, Yoast) and developer-tool companies (Cursor, Cloudflare AutoRAG).