Free tool · no signup, no rate limit

llms.txt Generator

llms.txt is a plain-text Markdown manifest served at /llms.txt that lists the canonical documentation URLs on a domain. AI assistants that support the convention fetch it to discover which pages to read. Enter a URL below and the crawler will produce a manifest grouped into Docs, API Reference, Guides, and Optional sections.

Advanced options Site name, summary, custom links

Overrides the H1 title at the top of the manifest.

Overrides the blockquote summary under the title.

Use this sitemap as the canonical URL source. Leave empty to auto-discover via link crawling and /sitemap.xml.

Appended as a ## Links section in the manifest.

Merged into the ## Optional section of the manifest.

Appended as a ## FAQ section in the manifest. Each entry becomes a ### question with the answer below.

Try: docs.stripe.com nextjs.org tailwindcss.com
How it works

Three steps. No setup.

01

Enter a URL

Paste a domain or documentation root. The crawler starts there and follows internal links breadth-first up to the configured depth.

02

Crawl runs

The crawler reads robots.txt, fetches sitemap.xml if present, throttles to one request per second per host, and parses title, meta description, and headings from each page.

03

Download llms.txt

The output is a Markdown manifest grouped by section (Docs, API, Guides, Optional). Edit inline, copy to clipboard, or download. Serve at https://your-domain.com/llms.txt with Content-Type text/plain.

What the crawler reads

URLs, titles, meta, and structure.

01 / Structure

Page hierarchy

Parses nav menus, breadcrumbs, and URL path segments to group pages into named sections. Pages under /docs/billing/* are clustered under a billing section.

02 / Docs

Documentation paths

Recognises common documentation roots (/docs, /guide, /learn, /handbook) and promotes them to the top of the manifest so AI agents land on the canonical entry points first.

03 / API

API references

Detects OpenAPI specs (/openapi.json, /swagger.json), endpoint documentation pages, and SDK references. These are flagged so language models can answer integration questions with verified endpoint paths.

04 / Time-sensitive

Blog and changelog

Identifies dated content (blog posts, release notes, changelog entries) by URL pattern and presence of dateModified metadata, then tags them as time-sensitive in the manifest.

05 / Meta

Titles and descriptions

Extracts the title tag, meta description, and Open Graph og:description for each page. These become the short summary line next to each URL in the manifest.

06 / Sitemaps

Sitemap.xml and robots.txt

Fetches /sitemap.xml first to seed the URL list. Reads /robots.txt and skips any path disallowed for the crawler user-agent. No URL is fetched against your stated rules.

FAQ

Frequently Asked Questions about llms.txt.

What is llms.txt?

llms.txt is a proposed convention for a plain-text Markdown manifest served at the root of a domain (example.com/llms.txt). It lists which URLs on the site are worth reading and how they are grouped. Function-wise it parallels robots.txt (crawler permissions) and sitemap.xml (URL discovery), but the audience is large language models rather than search engine crawlers.

Why would I publish an llms.txt?

AI assistants (ChatGPT, Claude, Perplexity, Gemini) increasingly fetch live web pages when answering. Without guidance they request marketing pages, blog snippets, and outdated material instead of canonical documentation. An accurate llms.txt points them at the right entry points so answers about your product cite verified, current content.

Is the tool free?

Yes. Single-shot generation for any public domain is free with no signup. Paid tiers cover scheduled regeneration, multi-domain workspaces, and HTTP API access. One-off use stays free permanently.

How does the crawler work technically?

The crawler fetches /robots.txt first to read disallow rules for its user-agent. It then fetches /sitemap.xml if present to seed the URL queue. Each URL is requested with a single HTTP GET, throttled to one request per second per host. The response HTML is parsed for title, meta description, Open Graph tags, and internal anchor hrefs. JavaScript is not executed by default, so client-rendered single-page applications return empty content.

Does this use AI to write the file?

Only for section labeling. A small classifier groups URLs into the four standard sections (Docs, API, Guides, Optional) based on path patterns. Page titles and descriptions are taken verbatim from the source markup. The crawler does not invent summaries or hallucinate content, and the output is fully editable before download.

Where do I place the generated file?

Serve the file at https://your-domain.com/llms.txt with Content-Type text/plain and HTTP 200 status. No authentication. AI agents that support the convention fetch this exact path. Optional companion file /llms-full.txt can hold expanded contents for assistants that support it.

Will this affect SEO or Google rankings?

Not directly. Search engines like Google still use sitemap.xml, internal links, and traditional crawl signals for ranking. llms.txt affects discovery and citation by AI assistants when they fetch live content, which is a parallel channel alongside search. Publishing llms.txt does not block, replace, or interfere with sitemap.xml or robots.txt.

What user-agent does the crawler send?

The crawler identifies with a clear named user-agent so server logs and rate-limiters can recognize it. Block, allow, or throttle it the same way you would any other named crawler via robots.txt or your edge configuration.

Can I regenerate the file automatically?

Scheduled regeneration is a paid-tier feature. The generator can be configured to re-crawl your domain on a weekly or monthly cadence and write the updated llms.txt to your origin via webhook or S3 upload. Single-shot manual generation remains free and unlimited.

What data is stored when I run the tool?

For anonymous runs, the submitted URL and aggregate metadata are logged for abuse prevention and rate-limiting. Generated output is discarded after delivery. No page content is persisted server-side. For signed-in users, run history is stored on the account and can be deleted at any time.