This benchmark measures page readiness. It does not guarantee AI citations.
That limitation is not fine print. It is central to using the benchmark honestly.
What It Can Support
- Whether key pages are crawlable and indexable.
- Whether pages are snippet-eligible.
- Whether source content is visible in HTML.
- Whether claims have evidence, methodology, or official citations.
- Whether schema matches visible content.
- Whether internal routes connect benchmark, support, service, proof, and contact pages.
- Whether crawler access is being monitored.
- Whether a page is built more like a reusable source or generic service copy.
What It Cannot Prove
- Stable LLM ranking position.
- Guaranteed AI Overview, ChatGPT, Claude, Perplexity, Copilot, or other AI-answer citations.
- Platform-wide visibility from one prompt test.
- Causal attribution between one page edit and one AI citation.
- Lead quality without analytics and CRM data.
- Hidden platform trust, authority, or weighting systems.
- Market-wide service-business averages.
- Vertical-level performance where the vertical sample is too small.
2026.06 Sample Limits
| Limitation | Why It Matters |
|---|---|
| 50 reviewed records | Strong enough for a narrow benchmark, not enough for sweeping market claims. |
| Anonymized domains | Protects audited businesses, but prevents third-party URL-level rechecks from the public dataset. |
| Uneven vertical mix | Consulting, accounting, and agency sites make up 35 of 50 rows. |
| Public-page only | No private analytics, CRM, Search Console, server-log, or conversion data. |
| Snapshot timing | Crawler access, HTTP responses, and page content can change after collection. |
| Semantic review still has caveats | 28 rows were approved with caveats, so review status should stay visible. |
| No actual citation tracking in this dataset | DUCR measures readiness, not confirmed AI answer citations. |
Correct Claim Shape
Use this kind of language:
"In a reviewed anonymized sample of 50 service-business websites, pages were generally accessible to search and AI-retrieval crawlers, but median citable readiness was only 4/30."
Do not use this kind of language:
"Service-business websites cannot get cited by AI unless they follow this system."
The second claim is garbage. It overstates what the evidence can prove.
Revision Policy
When official platform guidance changes, update the methodology and changelog. When the scoring model changes, preserve the old version and explain the change.
When the dataset expands beyond 50 records, update the sample-size language before changing any benchmark claims. Do not silently blend 2026.06 findings with later runs.
Related pages: