The citation source stack changes by therapeutic area

Most GEO frameworks treat "authoritative sources" as a single category: get cited by regulators, medical societies, and peer-reviewed journals, and you win. The May 2026 PharmaGEO public index shows that framing is too coarse to be useful. Across four therapeutic areas tracked on Perplexity — atopic dermatitis, lung cancer, psoriasis, and obesity — the dominant citation source shifts dramatically from one TA to the next. The strategic implication is direct: a content team that builds one source strategy for all of its portfolio brands will be optimised for some and wrong for the rest.

This article presents the raw source-use data by TA, maps those patterns onto a four-archetype framework, and draws out TA-specific playbook recommendations that teams can act on immediately.

The data: four TAs, four different source hierarchies

The table below shows the top cited domains for each therapeutic area on Perplexity, drawn from the May 2026 PharmaGEO public index. Citation use counts represent the number of times each domain appeared as a source across the full prompt set for that TA.

Source domain	Atopic Dermatitis	Lung Cancer	Psoriasis	Obesity
nccn.org	—	218 uses (≈85% of total)	—	—
aad.org	168 uses (#1)	—	114 uses (#2)	—
pmc.ncbi.nlm.nih.gov	94 uses (#2)	present	154 uses (#1)	present
aafp.org	92 uses (#3)	—	—	—
psoriasis-hub.com	—	—	70 uses (#3)	—
nice.org.uk	—	—	40 uses (#4)	—
dermnetnz.org	—	—	40 uses (#5)	—
accessdata.fda.gov (Zepbound label)	—	—	—	36 uses (#1)
accessdata.fda.gov (Wegovy label)	—	—	—	32 uses (#2)
novo-pi.com (Wegovy PI)	—	—	—	26 uses (#3)
nejm.org	—	16 uses (KEYNOTE-189)	—	24 uses (#4)

The structural differences across these four TAs are not noise. They reflect genuinely different epistemic architectures — each TA has a different type of authority that AI retrieval systems defer to. Oncology defers to treatment guidelines so heavily that one organisation effectively owns the source stack. Metabolic disease defers to the regulatory record. Inflammatory dermatology distributes weight across three source types simultaneously. Understanding which archetype applies to your TA is the first decision in any GEO content build.

Four source archetypes govern pharma GEO

Across the four TAs studied, every top-cited domain falls into one of four archetypes. The archetype determines both the citation weight a source carries and the content format that earns that weight.

Archetype	Example domains	Citation strength	Primary TAs
Society guideline	aad.org, nccn.org, ESMO, EASD	Highest in regulated TAs with consolidated guidance	Oncology, AD, Psoriasis
Regulatory primary	accessdata.fda.gov, ema.europa.eu (EPAR), nice.org.uk	Heavy in metabolic and safety-loaded TAs	Obesity, Metabolic
Peer-reviewed literature	nejm.org, pmc.ncbi.nlm.nih.gov, thelancet.com	Universal floor — present in every TA	All TAs
Specialist hub / disease-state	psoriasis-hub.com, dermnetnz.org, aafp.org	Engine-dependent; Perplexity favors strongly	Psoriasis, AD (primary care tier)

Why the archetype distribution matters more than any single domain ranking

A brand team that ranks aad.org as a target — because it sees AAD at #1 in the AD source data — may not realise that exactly the same domain appears at #2 in psoriasis, but is completely absent from the obesity and lung cancer stacks. That domain is not universally authoritative: it is specifically authoritative within inflammatory skin disease. The strategy implication is not "get mentioned on aad.org" as a general rule; it is "if your TA is inflammatory dermatology, aad.org is a must-win surface, and if it is metabolic disease, FDA label retrieval is the must-win surface instead."

Oncology: NCCN is not one source — it is a monopoly

The lung cancer data is the starkest illustration of guideline monopoly in pharma GEO. Of approximately 258 total citation uses counted for lung cancer on Perplexity in May 2026, roughly 218 were attributable to nccn.org across its multiple indexed documents: the NSCLC v5.2025 PDF (136 uses), the NSCLC landing page updated in 2026 (64 uses), and the v5.2026 update (18 uses). The next largest source — a single NEJM trial publication for KEYNOTE-189 — contributed 16 uses. NCCN's share of the citation stack is approximately 85%.

NCCN guidelines are the canonical treatment reference for US oncologists, comprehensively indexed, updated regularly, and structured with the explicit evidence tables and recommendation language that retrieval systems can directly quote. When an HCP queries Perplexity about treatment sequencing in NSCLC, the answer is structurally sourced from NCCN because that is where the authoritative answer lives. The practical consequence for oncology brands: visibility in AI is mediated almost entirely by visibility within NCCN. A brand that is not in a category 1 or 2 recommendation is structurally unlikely to appear in first-line answers about its own indication.

Metabolic disease: the AI answer layer is already a safety surface

The obesity source data reveals a structural property of metabolic disease GEO that has significant compliance implications. The top two sources by use count on Perplexity for obesity queries are both FDA label pages on accessdata.fda.gov — the Zepbound label at 36 uses and the Wegovy label at 32 uses. The third source is a manufacturer-hosted prescribing information page (novo-pi.com, 26 uses). NEJM clinical trial publications rank fourth at 24 uses.

FDA labels include boxed warnings, contraindications, REMS requirements where applicable, and full adverse event profiles. When Perplexity answers an obesity treatment query, it is drawing from these documents automatically, without any prompting from the person asking. The AI answer for "what are the treatment options for obesity" includes thyroid tumour warnings, pancreatitis warnings, and contraindication language — not because the AI is trying to be a safety resource, but because the FDA label is simply the most authoritative indexed source available.

Brand teams that treat the AI answer layer as a marketing surface are misreading the data. In obesity, it is structurally a mandatory safety delivery channel. The implication: ensure that safety language in AI answers is drawn from the most current, accurate label version — not an outdated indexed version or a third-party paraphrase. DailyMed provides an FDA-official, crawlable mirror of current prescribing information; keeping it current is as important as any branded content asset.

Off-label leakage compounds the safety exposure

The obesity source data is complicated by a pattern visible in the May 2026 PharmaGEO public index: Perplexity's obesity share-of-voice chart includes Ozempic at approximately 6% and Mounjaro at approximately 3%, despite both being indicated only for type 2 diabetes. The AI is connecting GLP-1 molecule brands to obesity queries regardless of approved indication. In a source environment where FDA labels are the #1 and #2 cited documents, every off-label mention is an MLR exposure that arrives through a safety-optimised source channel. This is a systemic AI behaviour, not a brand-specific problem — but it lands on brand teams as a brand-specific compliance question.

Dermatology: a three-way source equilibrium

Psoriasis sits in an intermediate position structurally. Neither guideline monopoly (like oncology) nor regulatory dominance (like obesity), psoriasis source distribution is genuinely pluralistic: PubMed Central leads at 154 uses, the AAD trails at 114, and a specialist disease hub — psoriasis-hub.com — places third at 70 uses, ahead of NICE at 40 uses and DermNet NZ also at 40 uses.

The presence of psoriasis-hub.com at 70 uses, outranking NICE, is the most operationally surprising finding in the psoriasis data. NICE is a national regulatory authority with significant institutional authority; psoriasis-hub.com is a specialist disease-state website. The fact that Perplexity cites the specialist hub more frequently than the regulatory authority reflects Perplexity's retrieval architecture: it weights source quality and topical specificity, and a disease-specific hub that publishes comprehensive, keyword-rich content on biologic treatment options will retrieve ahead of NICE guidance documents that are indexed less frequently or are less granular on specific treatment comparisons.

This finding is validated by the broader literature on AI citation behavior. An analysis of 1.2 million ChatGPT responses published by Kevin Indig in Search Engine Land found that 44.2% of citations come from the first 30% of a source document. Specialist hubs that front-load their treatment comparison content — putting mechanism, indication, and efficacy data in the first screen of any article — will systematically retrieve ahead of regulatory documents that bury the relevant information in structured appendices.

Atopic dermatitis: the society-led, literature-backstopped model

AD sits closest to a "clean" version of the society-guideline archetype. AAD ranks first at 168 uses, PubMed Central ranks second at 94 uses, and AAFP — the American Academy of Family Physicians — ranks third at 92 uses, reflecting that AD management is frequently handled in primary care. The source mix is coherent: a specialist society leads, a literature database backstops, and a primary care society provides the generalist access point.

The AD data also illustrates the PubMed floor rule most clearly. PubMed Central at 94 uses in AD, present in psoriasis, present in lung cancer and obesity — every TA in the May 2026 data has pmc.ncbi.nlm.nih.gov in its top sources. PubMed-indexed publication is the minimum viable GEO citation across all of pharma. If clinical evidence for a brand is not accessible through PubMed, retrieval systems have no authoritative path to surface it. This is not a differentiation strategy; it is table stakes.

The EMA EPAR channel: the underused citation vector

One pattern cutting across TAs — particularly in inflammatory dermatology — is the appearance of EMA European Public Assessment Report (EPAR) documents in English-language AI answers. EPARs are official regulatory documents published on ema.europa.eu, written in English, comprehensively indexed, and structured with the explicit clinical summary language that retrieval systems can quote directly. They are functionally equivalent to FDA labels for the EU approval pathway. Pharma brands that have EU approval but have not ensured their English EPAR is well-indexed and linked from owned content are leaving a high-authority citation channel unused. The English EPAR for a biologic drug is one of the most citable documents available for that brand in the AI retrieval graph.

General population citation patterns: the channels pharma cannot use

One structurally important piece of context for pharma GEO is the divergence between pharma source stacks and the general web citation ecosystem. According to the AI Platform Citation Source Index 2026, Reddit accounts for approximately 40% of ChatGPT citations across the general web, and Wikipedia accounts for 26–48%. Pharma brands cannot use either channel compliantly: Reddit’s user-generated content cannot be controlled, and Wikipedia’s editorial standards preclude promotional content. The society guideline, regulatory primary, and peer-reviewed literature archetypes are the compliant-accessible equivalents — the channels that carry institutional authority in the pharma citation graph. They are not optional; they are the only compliant path to AI visibility at scale.

TA-specific source stack recommendations

The data produces three distinct playbooks, one per structural TA type:

Oncology: align to NCCN and ESMO update cadences

With NCCN accounting for approximately 85% of Perplexity citation uses in lung cancer, the dominant tactical question for oncology brands is: where does our brand sit in the NCCN recommendation hierarchy, and how does that change with each version update? The citation data shows that both older (v5.2025) and newer (v5.2026) versions appear in source stacks simultaneously, creating a window of competing signals during version transitions. Brands should ensure owned clinical commentary addresses updated recommendation language within days of each NCCN release. For EU-exposed oncology brands, the same logic applies to ESMO update cycles.

Metabolic: optimize FDA label retrieval and the DailyMed mirror

For obesity and metabolic disease brands, the priority is ensuring the FDA label — already the #1 or #2 cited source — is current and accessible through multiple indexed paths: verify DailyMed reflects the most recent label, ensure manufacturer-hosted prescribing information pages are fully indexed, and link directly to current label language from any clinical education content. Track off-label leakage from same-molecule sister brands through regular Perplexity audits so MLR teams have advance signal on AI-surfaced safety language.

Dermatology: society plus EU regulatory plus selective specialist hub syndication

For psoriasis and AD brands, the three-archetype equilibrium requires a three-track content strategy: (1) maintain AAD representation and ensure AAD-published clinical data is well-indexed; (2) for brands with EU approval, connect owned content to ema.europa.eu EPARs and publish optimised EPAR summaries; (3) selectively syndicate to specialist disease hubs — dermnetnz.org and psoriasis-hub.com are already top-five citation sources for psoriasis — with front-loaded treatment comparison content, since 44.2% of AI citations come from the first 30% of source content.

No universal playbook. Only TA-specific ones.

The core finding from the May 2026 PharmaGEO source data: the sources AI systems cite for oncology bear almost no structural resemblance to those cited for obesity, and neither resembles the psoriasis or AD stack. A brand team that treats “authoritative pharma sources” as a single category will be well-calibrated for one TA and misaligned for the rest. The four archetypes — society guideline, regulatory primary, peer-reviewed literature, and specialist hub — are present in every TA. What changes is the weighting: which archetype dominates determines which content investment delivers the highest citation yield. Getting that weighting right is TA-specific work. There is no shortcut through a generic best-practices framework, because the citation graph does not have one either.

Want a real audit on your brand? Request a sample report or get the full PharmaGEO Playbook.

The citation source stack changes by therapeutic area.