The core shift

Traditional SEO optimizes a page to rank on a results page that a human clicks. AI SEO optimizes a page to be retrieved, quoted, and synthesized by a language model that may never send a click at all. Different audience, different objective.

The old question: does my page rank for this keyword? The new question: when an AI answers this query, does my content make it into the answer?

Quick definitions

A few terms you'll hit immediately:

With that out of the way, here's what actually moves the needle.

llms.txt and llms-full.txt

The llms.txt spec, proposed in 2024, is the simplest idea in AI SEO. A markdown file at the root of your site that lists your content in a format LLMs can parse without scraping HTML.

Two files, two jobs:

Declare them in robots.txt and link to them from your HTML head:

LLMs-txt: https://yoursite.com/llms.txt
LLMs-full-txt: https://yoursite.com/llms-full.txt
<link rel="llms-txt" href="/llms.txt">

Not every AI system reads these files yet. But the cost to add them is close to zero, and a lot of sites getting cited already publish them.

Schema.org as grounding data

Structured data (JSON-LD, schema.org) was built for Google rich results. It turns out to be near-perfect food for LLMs too. When you tag a page as a BlogPosting with author, datePublished, wordCount, and articleSection, you're handing an LLM a pre-parsed fact sheet about the content.

The schemas that matter most for AI right now:

sameAs is the most underrated field. It connects your content to entity graphs like Wikidata, Wikipedia, and Crunchbase. If an LLM is trying to decide whether two "John Smiths" are the same person, sameAs is the link that proves it.

Entity-first, not keyword-first

Keyword SEO was about ranking for strings. Entity SEO is about being the authoritative node for a concept.

"Language models don't match keywords. They match meaning, against an internal knowledge graph."

You win when your site is the cleanest source on an entity. Practically:

AI crawler directives

Different AI systems send different crawlers. You control what they see through robots.txt:

User-agent: GPTBot           # OpenAI
User-agent: ClaudeBot        # Anthropic
User-agent: PerplexityBot    # Perplexity
User-agent: Google-Extended  # Google's generative models
User-agent: CCBot            # Common Crawl

Default position: allow them all. Blocking AI crawlers means opting out of being cited. Most sites that care about showing up in AI answers allow everything.

Semantic chunking for RAG

When a RAG system retrieves your content, it doesn't grab the whole page. It grabs a chunk, usually a few paragraphs. If the chunk doesn't make sense without the rest of the page, it gets discarded.

Write chunks that stand alone:

Every heading section is a standalone document in miniature.

Citation surface

Being cited by an LLM matters more than ranking on page one, because AI answers often don't show ten blue links. There's one answer, maybe three sources linked underneath.

Surfaces to optimize for:

What gets cited: recent, specific, well-structured content with clear authorship and timestamps. Generic content that reads like it was written by committee gets skipped.

What's bleeding edge right now

A few things moving fast as of 2026:

What to actually do

If you have a site and you care about this:

  1. Add llms.txt and llms-full.txt.
  2. Declare them in robots.txt and in <link rel="llms-txt">.
  3. Add JSON-LD schema to every page (Article, Person, Organization, BreadcrumbList).
  4. Use sameAs on your Person schema.
  5. Write content in self-contained chunks with clear H2 structure.
  6. Keep publish and modified timestamps visible and accurate.
  7. Don't block AI crawlers unless you have a specific reason.

Traditional SEO isn't going away. But the next ten years of search optimization are about being the source an AI reaches for, not the link a human clicks.