bacground gradient shape
background gradient
background gradient

How AI Search Engines Decide What to Cite: The RAG Pipeline Explained

When a buyer asks ChatGPT “what’s the best CRM for enterprise sales teams,” the AI doesn’t Google the question and copy the top result. It runs a multi-stage pipeline that decomposes the query, retrieves candidate passages from an index, scores those passages for relevance and authority, synthesizes an answer, and decides which sources deserve a citation link. That pipeline is called Retrieval-Augmented Generation (RAG), and understanding how it works is the difference between GEO tactics that earn citations and GEO tactics that waste time.

80% of enterprise software developers now call RAG the most effective way to ground LLMs in factual data, according to Data Nucleus’ 2025 enterprise survey. Every major AI search platform runs some version of it: ChatGPT retrieves from Bing’s index, Perplexity from its own crawl index, Google AI Overviews from Google’s organic index. The models are different. The pipeline architecture is the same. If you understand the pipeline, you understand why certain content gets cited and why most content doesn’t.

This guide walks through each stage of the RAG pipeline as it applies to AI search, explains what happens to your content at each step, and maps every stage to the specific content decision that determines whether you get cited or skipped.

What RAG Is (and Why It Matters for GEO)

RAG is an AI architecture that combines two steps: retrieving relevant information from external sources, then generating a response grounded in that retrieved context. The concept was introduced by Patrick Lewis et al. (Meta AI, UCL, NYU) in a 2020 paper and has since become the production standard for every AI search platform.

Without RAG, an LLM answers from memory. Its training data has a cutoff date, and anything after that date doesn’t exist. With RAG, the LLM searches the web (or a private knowledge base), reads the results, and generates an answer that cites its sources. RAG is what makes AI search engines useful for current information and what makes citation optimization possible.

Without RAG

With RAG

Answers from training data only

Retrieves live information before answering

Knowledge cutoff limits accuracy

Can access content published yesterday

No citations possible (no source to attribute)

Cites the specific URLs it retrieved and used

Hallucination risk when asked about recent events

Grounded in retrieved evidence

Cannot be optimized through content changes

Can be influenced by content structure, authority, and freshness

The last row is the one that matters for GEO. RAG makes AI search optimizable. Without it, there’s no retrieval step to influence. With it, every stage of the pipeline becomes an optimization surface.

The Seven Stages of the RAG Pipeline

AI search engines don’t retrieve and cite in one step. The pipeline has distinct stages, and your content can fail at any one of them. Understanding where failures happen tells you where to focus your optimization effort.

Stage 1: Query Decomposition

When a user types “what’s the best project management tool for remote engineering teams under 50 people,” the AI doesn’t search for that entire string. It breaks the query into multiple sub-queries and searches for each one separately. This is called query fan-out.

The sub-queries might be: “best project management tools 2026,” “project management for remote teams,” “project management tools for engineering,” and “project management pricing under 50 users.” Each sub-query retrieves its own set of candidate documents. Pages that match multiple sub-queries are more likely to be cited in the final answer.

Pages covering both the primary query and its sub-queries are 161% more likely to be cited than pages matching only one, according to research compiled by ZipTie.dev analyzing AI citation patterns across multiple studies. 51% of citations go to pages covering both the main query and sub-queries. This is why comprehensive topic coverage matters more than single-keyword optimization.

What you control: The breadth of topics covered on a single page. A page that answers “what is it,” “who uses it,” “how to choose,” and “what does it cost” in one URL matches more sub-queries than a page that only answers one of those questions.

Stage 2: Document Retrieval

Each sub-query is converted into a semantic vector (an embedding) and matched against the platform’s index. Semantic search matches by meaning, not keywords. A query about “reducing employee churn” can retrieve a page about “staff retention strategies” even if the exact phrase “employee churn” never appears. This is fundamentally different from keyword-based SEO, where matching the exact query string was critical.

92% of ChatGPT agent queries rely on the Bing Search API, according to Search Engine Land’s October 2025 analysis. If your page isn’t indexed in Bing, it’s invisible to ChatGPT’s retrieval stage regardless of content quality.

Perplexity crawls the web independently. Google AI Overviews use Google’s own index.

What you control: Whether you’re indexed on each platform (submit sitemaps to Bing and Google). Whether your headings and opening sentences semantically match the queries your buyers ask. Whether your robots.txt allows GPTBot, PerplexityBot, and ClaudeBot.

Stage 3: Chunking and Passage Extraction

Retrieved pages aren’t read as single documents. The RAG system breaks each page into chunks, typically at heading boundaries (H2, H3). Each chunk is scored independently. A 3,000-word article might produce eight to twelve chunks, and only one or two of those chunks will survive to the next stage.

The chunk that gets scored is the text under a single heading. If that heading says “Pricing” and the first sentence says “Let’s explore the various pricing considerations for this category,” the chunk opens with filler. If the heading says “How Much Does Project Management Software Cost?” and the first sentence says “Mid-market project management tools cost between $8 and $30 per user per month,” the chunk opens with the answer.

44.2% of all LLM citations come from the first 30% of a page’s content, according to Growth Memo’s February 2026 citation distribution analysis.

Within each chunk, the first one to two sentences are what determines whether it survives scoring. The rest of the chunk provides supporting evidence, but the opening is the filter.

What you control: Heading text (question format matches sub-queries). First two sentences of each section (must directly answer the heading’s question). Self-containment (no references to other sections).

Stage 4: Passage Scoring and Reranking

Surviving chunks are scored on multiple dimensions. The exact scoring varies by platform, but the research consistently identifies the same signals.

Scoring Signal

What It Measures

How to Optimize

Semantic relevance

How closely the passage answers the specific sub-query

Match heading text and opening sentence to buyer questions

Factual specificity

Whether the passage contains verifiable claims with numbers

Include one attributed stat per section: number + source + year

Source authority

Whether the passage names credible sources for its claims

Cite research by name. “According to Gartner’s 2025 report” beats “studies show.”

Content freshness

How recently the page was updated

Date-stamp articles. Refresh quarterly. AI-cited content is 25.7% fresher than traditionally ranked content (Ahrefs, 2025).

Entity clarity

Whether the passage clearly identifies what it’s about

Name products, companies, and people explicitly. Avoid pronouns and vague references.

The Princeton GEO study (Aggarwal et al., KDD 2024) tested nine optimization methods against these scoring signals. Adding specific statistics improved visibility by 41%. Adding expert quotes improved it by 28%. Keyword stuffing decreased it by 3%. The scoring system rewards specificity and penalizes artificial optimization.

What you control: Whether every passage contains a named source with a number. Whether entities are explicitly named. Whether content is dated and recent.

Stage 5: Context Assembly

The top-scored passages from multiple sources are assembled into a context window. The LLM reads this assembled context, not the original web pages. Typically, four to seven sources make it into the context window for a single query.

This is where the competitive dynamics play out. Your passage is sitting next to three to six competitor passages. The LLM reads all of them and decides which facts to include in its response and which sources to attribute them to.

67% of ChatGPT’s top 1,000 cited pages are “dead citations” like Wikipedia, app stores, and homepages that brands can’t displace, according to research cited by Security Boulevard in February 2026.

The competition for the remaining citable positions is intense. Your passage needs to contain a fact that the other passages in the context window don’t.

What you control: Whether your passage contains a unique data point, proprietary insight, or comparison table that competitors lack. This is the Depth dimension in GEO scoring: does your content say something the other sources don’t?

Stage 6: Response Generation

The LLM synthesizes an answer from the assembled context. It doesn’t copy any single source verbatim. It reads all the passages, understands the concepts, and writes a new response that combines information from multiple sources.

This is why content structure matters more than prose quality. The LLM isn’t evaluating your writing style. It’s extracting facts: names, numbers, comparisons, definitions. A beautifully written paragraph with no specific claims produces nothing the LLM can use. A plainly written paragraph with a specific stat, a named source, and a clear definition gives the LLM a building block for its response.

What you control: Whether your content contains extractable facts (not just opinions or commentary). Whether those facts are structured as clear claims with attribution.

Stage 7: Citation Assignment

The LLM decides which sources to cite. Not every source in the context window gets a citation. ChatGPT only cites 15% of the pages it retrieves, according to AirOps’ March 2026 analysis. The citation decision is the final filter.

A source earns a citation when the LLM can trace a specific fact in its response back to a specific passage from that source. If the LLM uses your stat (“sales cycles average 84 days for deals above $50K”) and your passage is the only source with that number, you get the citation. If three sources in the context window all say the same thing, the LLM may paraphrase the consensus and cite nobody.

Citation Outcome

Why It Happens

How to Earn It

Cited with link

LLM used a specific fact that traces back to your passage uniquely

Publish original data, proprietary benchmarks, or unique comparisons

Mentioned without link

LLM recognizes your brand but doesn’t use a specific fact from your page

Improve content structure so specific claims are extractable

Retrieved but not cited

Your page made it to context assembly but didn’t contribute a unique fact

Add data points competitors don’t have

Not retrieved

Your page wasn’t in the index or didn’t match the sub-queries

Check indexation. Match headings to buyer questions.

Not eligible

AI crawlers are blocked in your robots.txt

Allow GPTBot, PerplexityBot, ClaudeBot

What you control: Whether your content contains facts unique enough to be worth attributing. Original data, proprietary analysis, and structured comparison tables are the content types most likely to earn citation attribution.

How Each Platform Implements RAG Differently

The pipeline is the same architecture, but each platform’s implementation differs in ways that matter for optimization.

Pipeline Element

ChatGPT

Perplexity

Google AI Overviews

Index source

Bing (92% of queries)

Own crawl index + web

Google’s organic index

Search trigger

31% of prompts trigger web search. 53.5% of commercial prompts trigger search. (Profound/Blyskal, 2026)

Nearly all queries trigger retrieval

Triggered on ~48% of Google searches (BrightEdge, February 2026)

Chunking behavior

Reads pages in reading mode (no JS/CSS) for 46% of visits (Search Engine Land, October 2025)

Full page rendering

Standard Google crawl

Citation style

Source links at bottom of response

Inline numbered references with source cards

Cited cards linking to source pages

Source overlap with other platforms

Only 11% domain overlap with Perplexity (Averi, 2026)

Only 11% overlap with ChatGPT

13.7% URL overlap between AI Overviews and AI Mode (Ahrefs, September 2025)

Preferred content type

Wikipedia and encyclopedic content: 47.9% of top citations (Averi, 2026)

Reddit and niche directories: 46.7% + 24% (Averi, 2026)

YouTube and multi-modal content: 23.3% (Averi, 2026)

Freshness weight

Moderate (training data + live retrieval hybrid)

High (live index, fresh content favored)

High (tied to Google crawl cycle)

The platform differences explain why optimizing for “AI search” as a single category fails. A page optimized for ChatGPT’s Bing-based retrieval may be invisible on Perplexity’s independent index. Content that ranks well for Google AI Overviews may never surface in ChatGPT because it’s not in Bing’s index.

Mapping the Pipeline to Your Content

Every stage of the RAG pipeline maps to a specific content decision. This table is the cheat sheet.

Pipeline Stage

What the AI Does

What You Should Do

Query decomposition

Breaks the query into 3–5 sub-queries

Cover multiple related questions in one URL. Answer “what/who/how/cost” on a single page.

Document retrieval

Searches the platform’s index by semantic similarity

Submit sitemaps to Bing and Google. Write headings that semantically match buyer questions. Allow AI crawlers.

Chunking

Breaks your page into passages at heading boundaries

Make every H2/H3 section self-contained. No “as mentioned above” references.

Passage scoring

Ranks passages by relevance, specificity, authority, freshness

Open every section with a direct answer + one attributed stat. Date-stamp and refresh quarterly.

Context assembly

Places your passage next to 3–6 competitor passages

Include data, comparisons, or analysis that competitors lack. This is where unique value wins.

Response generation

Extracts facts from passages to build the answer

Write specific, extractable claims. Names, numbers, and named sources. Not opinions or commentary.

Citation assignment

Attributes facts to the source that contributed them uniquely

Publish original data. Build comparison tables. Be the primary source, not the summarizer.

How to Choose Which Pipeline Stage to Optimize First

The RAG pipeline fails silently at different stages for different pages, and optimizing the wrong stage burns cycles. Use these rules to find the binding failure before writing new content.

  • If your page is not in Bing’s index, stage 2 is your failure point. Submit sitemaps and verify GPTBot, PerplexityBot, and ClaudeBot are allowed in robots.txt before touching the content itself.

  • If your H2s are generic labels, stage 3 is the failure point. Rewrite headings to match buyer sub-queries, because chunking is where most content silently drops out.

  • If your opening sentences hedge or restate the heading, stage 4 (scoring) is the failure point. Every chunk should open with a direct answer plus one attributed stat.

  • If competitors with weaker authority outrank you in AI answers, stage 5 is the failure point. The context assembly is preferring their specific data over your generic prose. Publish original numbers or a comparison table.

  • If you are retrieved but not cited, stage 7 is the failure point. Res AI’s 1,000-query Perplexity study found only 5.9% of Perplexity citations go to vendor sites (Res AI, 1,000-query Perplexity study, 2026). A unique stat is the fastest way into the cited 5.9%.

  • If multiple engines retrieve you but the citations do not overlap, treat ChatGPT and Perplexity as separate optimization targets, not a single channel. Only 11% of domains are cited by both (Averi, 2026).

Start at whichever stage has the lowest current throughput; everything downstream is wasted effort until that gate opens.

Frequently Asked Questions

Why does query decomposition matter more than keyword matching?

AI engines no longer match the exact user query against the index; they rewrite the query into 3 to 5 sub-queries and retrieve against each one. Pages that only answer one of those sub-queries compete in a narrower retrieval pool and lose against pages that match several. The keyword-density tactics that worked for Google’s earlier ranking models do not survive this rewrite step.

What is the difference between retrieval and citation inside the pipeline?

Retrieval is the stage where a page enters the candidate pool; citation is the stage where a specific passage gets attributed in the final answer. A page can be retrieved and still lose the citation if its passage does not contribute a unique fact. ChatGPT only cites 15% of pages it retrieves (AirOps, March 2026).

Why does chunking happen at heading boundaries instead of paragraph boundaries?

Headings are the most reliable structural signal inside HTML, so the chunker uses them as natural break points. A passage under a clear H2 answer capsule is easier to score and attribute than a mid-paragraph sentence. This is why generic section labels like “Features” silently cost pages citations: the chunk opens with no semantic match against any sub-query.

Does the length of a chunk matter for passage scoring?

Yes, with diminishing returns. A chunk in the 40 to 80 word range is long enough to hold a claim with attribution and short enough to survive the relevance filter. Longer chunks get truncated; shorter chunks lack the context the model needs to bind the claim to the heading.

What counts as a “unique fact” during citation assignment?

A specific number, a named entity, or a comparison that no other passage in the context window contains. Res AI’s 852-article B2B citation structure study found top-cited pages contain an average of 7.52 structural elements versus 5.51 for bottom-cited pages (Res AI, 852-article B2B citation structure study, 2026). The structural elements are where the unique facts live.

How many passages from a single page can make it into the context window?

Usually one, occasionally two. A 3,000 word article with twelve chunks competes against itself for the same citation slot. This is why front-loading the strongest answer into the first section matters more than spreading evidence across the whole page.

Why does Perplexity cite so many more sources per response than ChatGPT?

Perplexity’s retrieval stage pulls a wider candidate pool and its citation assignment is less aggressive about deduplication. Res AI’s 1,000-query Perplexity study measured 7.6 sources per response (Res AI, 1,000-query Perplexity study, 2026). ChatGPT runs retrieval only on 31% of prompts and cites 15% of retrieved pages, which produces fewer total citations per answer.

Does the RAG pipeline ever skip retrieval entirely?

Yes. ChatGPT only triggers web search on 31% of prompts, rising to 53.5% for commercial queries (Profound/Blyskal, 2026). For everything else, the LLM answers from its training memory with no citation step at all. This is why low-intent queries produce recommendations without sources even on the same model.

How does “reading mode” affect the pipeline stages?

Reading mode affects stage 2 onward. ChatGPT loads 46% of bot visits without CSS, JavaScript, or images (Search Engine Land, October 2025). Any content that requires a browser to render is invisible to the retrieval, chunking, and scoring stages for nearly half of the traffic. Plain HTML text is the safest format for every stage.

Can original data really move the needle inside a RAG system?

Yes, because original data is the cleanest way to win stage 7 (citation assignment). The LLM attributes a fact to the source that uniquely contributes it. Res AI’s own published studies show up in AI citations precisely because no other source has the underlying numbers. Commissioning a small dataset, running a survey, or publishing a proprietary benchmark is one of the highest-impact RAG optimizations.

Res AI engineers content for every stage of the RAG pipeline: prompt libraries mapped to sub-queries, answer capsules optimized for passage scoring, comparison tables that survive context assembly, and original data that earns citation attribution. Published directly to your CMS.

See how it works →

Share

Your content is invisible to AI. Res fixes that.

Your content is invisible to AI. Res fixes that.

Get cited by ChatGPT, Perplexity, and Google AI Overviews.

Get cited by ChatGPT, Perplexity, and Google AI Overviews.