AI Visibility Benchmark Methodology for Service-Business Websites

This methodology explains how SEO Informatica audits service-business websites for DUCR readiness: Discoverable, Understandable, Citable, and Routable.

The 2026.06 benchmark uses a reviewed anonymized sample of 50 service-business websites. Automated crawl fields were collected first, then rows were manually reviewed for semantic fit, source quality, and obvious automation errors before final scoring.

Sample Criteria

A site qualifies if it sells a service, has at least one service page and one conversion route, exposes indexable public HTML, targets customers through search or local discovery, and can be audited without admin access.

The 2026.06 sample includes 50 unique anonymized domains across consulting, accounting, agency, home services, legal, dental, pest control, junk removal, roofing, wellness, and med spa categories.

Exclusion Rules

Exclude parked, login-only, under-construction, PDF-only, app-only, non-service, adult, gambling, weapon, illegal, duplicate, inaccessible, or owner-excluded websites.

During semantic review, records are removed or replaced if the reviewed page is not a service-business page, if the browser-rendered page is blocked or misleading, if the row appears duplicated, or if crawler output cannot support the scoring fields.

Pages Audited Per Site

Page Type	Required
Homepage	Yes
Primary service page	Yes
Supporting informational page	Preferred
About page	Preferred
Contact or lead route	Yes
Location page	Optional

Automated Checks

The crawler checks status codes, robots rules, sitemap presence, canonicals, meta robots, snippet controls, headings, schema types, visible text, internal links, lists, tables, download links, WAF signals, CAPTCHA signals, and JavaScript challenge signals.

For sitemap checks, the crawler limits sitemap exploration so one large site cannot dominate the run. The 2026.06 crawler used bounded sitemap discovery with a maximum of eight sitemap candidates, 20,000 sitemap URLs parsed, and an eight-second sitemap timeout per request.

Bot Access Checks

The run checks separate crawler/user-agent fields for Googlebot, Bingbot, OAI-SearchBot, GPTBot, ChatGPT-User, ClaudeBot, Claude-SearchBot, Claude-User, PerplexityBot, and Perplexity-User.

These are intentionally separated. Training crawlers, search/indexing crawlers, and user-directed fetchers are not the same thing. Treating them as one "AI bot" bucket would produce bad recommendations.

Manual Semantic Review

The reviewer checks page role, target audience, service/entity clarity, proof, author/date/reviewer visibility, methodology/process explanation, limitations, CTA route, source support, claim restraint, and whether the page looks like a valid service-business page in a browser.

The final 2026.06 dataset contains:

Manual Review Status	Count
`semantic_review_approved`	22
`semantic_review_approved_with_caveat`	28
Critical blockers	0

Scoring Process

The raw audit file is scored by scripts/score_ducr.py against data/ducr_scoring_rubric.json. Semantic review overrides are applied with scripts/apply_semantic_review.py, then the reviewed records are scored again.

The 2026.06 benchmark uses a 100-point DUCR rubric:

Layer	Points
Discoverable	25
Understandable	25
Citable	30
Routable	20
Total	100

Full scoring detail: /ai-visibility-benchmark/ducr-score/

Reproducibility Notes

Each record stores collector_version, crawl_timestamp, crawl_user_agent, and source_html_hash. Public statistics must be traceable to dataset fields and scoring rules.

The public dataset withholds domains and URLs, but preserves anonymized site IDs and domain hashes so duplicate handling and record-level analysis remain possible without exposing audited businesses.

Official Guidance Used

The methodology aligns crawler and indexing checks with official platform documentation where available:

Source	How It Informs The Benchmark
OpenAI crawler documentation	Separates OAI-SearchBot, GPTBot, and ChatGPT-User roles.
Google Search Central robots.txt documentation	Separates crawl control from indexing control.
Google Search Central robots meta documentation	Supports noindex, nosnippet, data-nosnippet, and X-Robots-Tag checks.
Anthropic crawler documentation	Separates ClaudeBot, Claude-SearchBot, and Claude-User roles.
Perplexity robots.txt documentation	Supports PerplexityBot robots-access checks.

Limitations

This benchmark measures public page readiness. It does not prove AI citation outcomes, traffic volume, CRM lead quality, hidden platform trust, or platform-wide visibility.

Full limitation notes: /ai-visibility-benchmark/limitations/