GEO & AI SEO

How AI selects
its sources

When ChatGPT, Perplexity or Google AI generates an answer, it chooses which sources to cite. That is not a random process. AI runs through a selection mechanism that determines which websites are trustworthy enough, which content is most relevant and which passages are most quotable. The top 15 domains capture 68% of all AI citations. Domain Authority predicts only 4% of AI citations, while E-E-A-T correlates at 0.81. This article explains exactly how that selection process works and what determines whether AI chooses you as a source.

68%

of AI citations go to the top 15 domains

r=0.81

correlation between E-E-A-T and AI citations

4%

of AI citations predicted by Domain Authority

7.4

average sources per Perplexity answer

The shift from ranking to source selection

With Google, it is about ranking: which page sits at position one? With AI search engines, it is about source selection: which page gets cited as a source in the answer? These are fundamentally different processes. Ranking is a fixed list. Source selection is a dynamic choice made fresh for every question. The same website can be chosen as a source for one question and skipped for a similar one.

The difference becomes clear when you look at the numbers. Only 38% of sources cited in Google AI Overviews rank in the top 10 of regular Google search results. That means 62% of cited sources are pages you would not even find on the first page of a normal Google search. AI selects sources on different criteria than Google ranks pages. The correlation between a high Google position and an AI citation is only 0.18. That is nearly random.

Domain Authority is losing ground

Domain Authority, for years one of the most important predictors of Google ranking, predicts only 4% of AI citations. The correlation is r=0.18. That is almost negligible. By comparison, E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) correlates at r=0.81 with AI citations. That is an enormous difference. AI rewards expertise and trustworthiness, not domain size. A specialist with strong credentials gets cited ahead of a large portal with generic content.

This opens the door for smaller businesses and niche sites. A specialised website about solar panels with strong E-E-A-T signals (author information, expert credentials, citations in trade publications) can be chosen by AI over a large energy portal with high Domain Authority but generic content. AI looks for the best source per specific question, not the biggest website in the industry.

For UK businesses, this is particularly relevant. A sole trader electrician in Birmingham with a well-structured website, Gas Safe registration clearly displayed, detailed service descriptions and strong Checkatrade reviews can be cited by AI ahead of a national chain with higher Domain Authority but less specific content. The playing field is more level in AI search than it ever was in traditional SEO.

But there is a caveat. Concentration is high: the top 15 domains capture 68% of all AI citations. Reddit dominates with 40% of all citations across platforms. Wikipedia, YouTube and major news sites take a large share of the rest. That means there is a relatively small share of citations left for the rest of the internet. Getting cited as a smaller site is possible, but you need to optimise specifically for it.

The rules have changed. Domain Authority is no longer the key. E-E-A-T, semantic completeness and platform-specific optimisation now determine whether you get chosen as a source.

How RAG works: training data versus real-time retrieval

AI search engines use two sources of knowledge. The first is training data: the enormous amount of text the model was trained on. ChatGPT, Gemini and Claude processed billions of web pages, books and documents during training. That knowledge is baked into the model. But training data has a limitation: it is frozen at the moment of training. Everything published after the training period is missing.

The second source is RAG: Retrieval-Augmented Generation. This is the mechanism by which AI retrieves real-time information from the web. When you ask a question to Perplexity or ChatGPT with web search enabled, the system first runs a retrieval step: it searches for relevant pages on the live web. Those pages are processed and the relevant passages are fed as context to the language model. The model then generates an answer based on both its trained knowledge and the retrieved pages.

The retrieval step as bottleneck

RAG is critical for source selection because it determines which pages actually get retrieved and processed. The retrieval system acts as a pre-filter. Only pages that pass the retrieval step have a chance of being cited. 73% of errors in AI answers originate in the retrieval phase, not in the generation phase. That means the problem is usually not that AI misinterprets your content, but that it does not retrieve your content in the first place.

The retrieval step works through a combination of search queries and semantic matching. AI internally formulates one or more search queries based on the user's question. It searches via a search index (Perplexity and ChatGPT use Bing among others) and retrieves the most relevant results. Those results are then filtered on quality, relevance and trustworthiness.

Modern RAG systems are becoming increasingly sophisticated. CRAG (Corrective RAG) checks the quality of retrieved documents and rejects sources that are not trustworthy enough. These systems are continuously improved, raising the bar for source selection ever higher. For UK businesses, this means the basics matter: your page must be indexed in Google and Bing, your sitemap must be current, and your page must be technically accessible to crawlers.

73% of errors in AI answers originate in the retrieval phase. The problem is usually not that AI misinterprets you, but that it does not find you.

For your visibility, this means two things. First: your page must be findable in the search index AI uses. Ensure your page is indexed in Google and Bing, your sitemap is up to date and your page is technically accessible to crawlers. Second: your page must survive the quality filters. Structured data, clear author information and semantically complete content increase the chance your page does not get filtered out in the retrieval step. Learn more in what AI reads on your website.

Platform differences: each AI system selects differently

ChatGPT

Averages 3.86 citations per answer. Leans heavily on web consensus. 87% of citations match the top 10 results from Bing. Prefers sources confirmed across multiple other sources. Wikipedia is the dominant source, followed by established news sites.

Strategy: be visible on Wikipedia and in major media

Gemini

Cites notably many brand-owned domains: 52% of its citations point to the websites of the brands mentioned in the answer. That is significantly higher than other platforms. Your own website is a stronger source for Gemini than for ChatGPT or Perplexity.

Strategy: invest in your own website content

Perplexity

Averages 7.42 sources per answer, the most citation-dense platform. Works like a search engine with an AI layer. Reddit dominates at 46.7% of citations. Values source diversity: cites from multiple sources for balanced answers. Most accessible platform for smaller businesses.

Strategy: be active on Reddit and diverse platforms

Google AI Overviews

Averages 6 to 8 sources per answer. Most integrated with the traditional search index. Prefers well-structured pages with schema markup and complete FAQ sections. The platform where technical optimisation has the greatest effect.

Strategy: schema markup and structured FAQ sections

Only 11% of domains are cited by both ChatGPT and Perplexity. 89% of cited domains are platform-specific. A multi-platform strategy is essential.

Are you being chosen as a source by AI?

VestVale automatically monitors whether ChatGPT, Gemini, Claude and Google AI cite your business. Find out if AI selects you as a source.

Get started | from £19.95/mo

E-E-A-T and author quality: the new key to source selection

E-E-A-T correlates at r=0.81 with AI citations. That makes it by far the strongest predictor of source selection. But what does AI actually measure when it assesses E-E-A-T? It looks at a combination of signals that together form the trust profile of a source.

Author information is one of the strongest signals. Content with a named author who has credentials (job title, qualifications, experience) scores significantly better than anonymous content. AI can cross-reference the author with other sources: does the same author appear on LinkedIn, in trade publications, on conference speaker lists? The more external confirmation of the author's expertise, the stronger the E-E-A-T signal.

For UK professionals, this is directly actionable. A solicitor writing about employment law should have their Law Society number, years of practice and areas of specialisation clearly displayed. An accountant writing about Making Tax Digital should show their ICAEW or ACCA membership. A builder discussing loft conversions should display their Federation of Master Builders membership and relevant certifications.

Author profile as a ranking factor

Practically, this means: put author information on every content page. Create author pages with a bio, photo, qualifications and publications. Use Person schema (JSON-LD) to make this information machine-readable. Link the author page to external profiles (LinkedIn, professional bodies). The richer the author profile, the stronger the E-E-A-T score for all content written by that author.

Beyond author quality, AI looks at organisational authority. Is your business mentioned in trade publications? Do you have awards or certifications? Are your experts cited by others? Each of these signals strengthens the E-E-A-T of your entire website. An accountancy firm whose founder is regularly quoted in industry publications builds an authority signal that benefits all content on that firm's website.

Experience signals are becoming increasingly important. AI looks for content written from direct experience. Case studies, client results and practical examples signal that the author does not merely write about the subject but actually practises it. A marketing agency that describes how it increased a specific client's AI visibility by 40%, including the approach and the results, delivers stronger experience signals than a theoretical article about the same techniques.

Transparency builds trust

Trustworthiness is expressed through transparency. AI values sources that make clear who is behind the content, how they earn their money and what interests they have. A comparison site that transparently states it receives affiliate commissions scores better than one without that disclaimer. For UK businesses, this includes clear terms, a physical address, Companies House registration and contact details. Transparency builds trust with both users and AI.

Semantic completeness: the complete answer wins

AI measures how completely your content covers a topic. This measurement is called semantic completeness and it is one of the strongest predictors of source selection. Content with a semantic completeness score of 8.5 or higher (on a scale of 10) is 4.2 times more likely to be cited than content with a lower score. AI prefers sources that cover the topic in one go, so it does not need to piece together an answer from multiple sources.

Semantic completeness is not about word count. It is about coverage: do you address all the relevant subtopics? An article about "setting up a limited company" that only describes the steps but says nothing about costs, tax advantages, accountant fees, share capital and alternatives (sole trader, partnership), is semantically incomplete. AI "knows" which subtopics belong to a topic because it has processed millions of documents about that topic.

Measuring completeness as a checklist

How do you measure this yourself? Look at the headings of the top 10 pages for your topic. Which subtopics are consistently covered? Those subtopics are the expected components of a complete answer. If your content addresses all those subtopics and adds a few unique angles, you score highly on semantic completeness. Use the heading structure of your article as a checklist: each H2 should cover a logical subtopic.

Topical authority strengthens semantic completeness at the website level. A website with ten related articles about a topic, each with cross-links to the others, builds a knowledge cluster that AI recognises. AI does not only cite individual pages; it also assesses the broader context of the website. A website that consistently publishes in-depth content about a specific area of expertise is more likely to be selected as a trustworthy source.

A concrete example: a UK solicitor's firm that has separate pages about employment tribunals, unfair dismissal, redundancy rights, settlement agreements, discrimination claims and whistleblowing builds topical authority across the entire subject of employment law. When someone asks AI about a specific sub-topic, AI factors in that this source also covers the broader topic. That increases the trustworthiness of the individual page.

From isolated articles to knowledge clusters

The strategy is clear: do not write a single article about your core topic. Build a cluster of related articles, each with its own angle, that together cover the entire topic. Link them internally. Ensure each article is semantically complete on its own. This builds both page-level and website-level signals that AI uses in source selection. Read more about how to structure content for AI in how AI assembles its answers.

Strategy: how to get chosen as a source

Invest in E-E-A-T

Author information on every page. Person schema for experts. External mentions in trade publications. Case studies with concrete results. E-E-A-T is the strongest predictor of AI citations at r=0.81. For UK regulated professions, display your professional body membership prominently.

Optimise per platform

ChatGPT follows web consensus: be visible on Wikipedia and in major media. Perplexity values diversity and Reddit mentions. Gemini prefers your own website. Google AI Overviews reward schema markup and FAQ sections. Only 11% of domains are cited on both ChatGPT and Perplexity.

Build content clusters

Write multiple related articles about your area of expertise. Link them internally. Build topical authority. Content with a semantic completeness score of 8.5+ is 4.2x more likely to be cited. Cover all the subtopics your competitors cover, then add unique angles.

Be findable in the search index

73% of RAG errors originate in the retrieval phase. Ensure your page is indexed in Google and Bing. Keep your sitemap up to date. Make your page technically accessible to crawlers. If AI cannot retrieve you, it cannot cite you. Submit new pages via Google Search Console and Bing Webmaster Tools.

Diversify your presence

Do not rely solely on your website. Be present on Trustpilot, Checkatrade, Google Business, industry directories and relevant Reddit communities. Each platform feeds into AI's source pool. The more places AI can find consistent information about your business, the more confident it is in citing you.

Monitor your citations

Source preferences shift rapidly. Reddit went from marginal to 40% of all citations in two years. What works today can be outdated in six months. Monitor which platforms cite you and adjust your strategy based on current data. VestVale automates this monitoring across all major AI platforms.

Frequently asked questions

Does a high Google ranking guarantee AI citations?

No. The correlation between Google position and AI citation is only 0.18. 62% of sources cited in AI Overviews are not in the Google top 10. AI selects on quotability, E-E-A-T and semantic completeness, not on traditional ranking position. A strong Google position helps you get retrieved, but it does not guarantee you get cited.

Why does Domain Authority matter so little for AI?

Domain Authority measures backlink profile strength, which is a Google-specific signal. AI platforms do not use backlinks as a selection criterion. They evaluate content quality, E-E-A-T signals and semantic completeness directly. A specialist website with fewer backlinks but stronger expertise signals can outperform a high-DA generic site in AI citations.

Can small businesses compete with large sites for AI citations?

Yes, for specific queries. AI looks for the best source per question, not the biggest website. A specialist plumber in Bristol with detailed, specific content about boiler repairs is more likely to be cited for "emergency boiler repair Bristol" than a national directory with thin listings. Specificity, E-E-A-T and semantic completeness are the equalising factors.

Why does each AI platform cite different sources?

Each platform uses different retrieval mechanisms, different source indexes and different quality filters. ChatGPT uses Bing and favours web consensus. Perplexity searches broadly and values diversity. Gemini leans on Google's Knowledge Graph. Only 11% of cited domains appear on both ChatGPT and Perplexity. This is why a multi-platform monitoring approach is essential.

Is AI selecting your business as a source?

VestVale automatically monitors whether ChatGPT, Gemini, Claude and Google AI cite your business. Find out where you stand across all platforms.

From £19.95/mo ex. VAT. Cancel monthly.