SEOInformatica
SEO Informatica
SEOInformatica
SEO Informatica

AI Visibility Benchmark Methodology for Service-Business Websites

This methodology explains how SEO Informatica audits service-business websites for DUCR readiness: Discoverable, Understandable, Citable, and Routable.

The 2026.06 benchmark uses a reviewed anonymized sample of 50 service-business websites. Automated crawl fields were collected first, then rows were manually reviewed for semantic fit, source quality, and obvious automation errors before final scoring.

Sample Criteria

A site qualifies if it sells a service, has at least one service page and one conversion route, exposes indexable public HTML, targets customers through search or local discovery, and can be audited without admin access.

The 2026.06 sample includes 50 unique anonymized domains across consulting, accounting, agency, home services, legal, dental, pest control, junk removal, roofing, wellness, and med spa categories.

Exclusion Rules

Exclude parked, login-only, under-construction, PDF-only, app-only, non-service, adult, gambling, weapon, illegal, duplicate, inaccessible, or owner-excluded websites.

During semantic review, records are removed or replaced if the reviewed page is not a service-business page, if the browser-rendered page is blocked or misleading, if the row appears duplicated, or if crawler output cannot support the scoring fields.

Pages Audited Per Site

Page Type Required
Homepage Yes
Primary service page Yes
Supporting informational page Preferred
About page Preferred
Contact or lead route Yes
Location page Optional

Automated Checks

The crawler checks status codes, robots rules, sitemap presence, canonicals, meta robots, snippet controls, headings, schema types, visible text, internal links, lists, tables, download links, WAF signals, CAPTCHA signals, and JavaScript challenge signals.

For sitemap checks, the crawler limits sitemap exploration so one large site cannot dominate the run. The 2026.06 crawler used bounded sitemap discovery with a maximum of eight sitemap candidates, 20,000 sitemap URLs parsed, and an eight-second sitemap timeout per request.

Bot Access Checks

The run checks separate crawler/user-agent fields for Googlebot, Bingbot, OAI-SearchBot, GPTBot, ChatGPT-User, ClaudeBot, Claude-SearchBot, Claude-User, PerplexityBot, and Perplexity-User.

These are intentionally separated. Training crawlers, search/indexing crawlers, and user-directed fetchers are not the same thing. Treating them as one "AI bot" bucket would produce bad recommendations.

Manual Semantic Review

The reviewer checks page role, target audience, service/entity clarity, proof, author/date/reviewer visibility, methodology/process explanation, limitations, CTA route, source support, claim restraint, and whether the page looks like a valid service-business page in a browser.

The final 2026.06 dataset contains:

Manual Review Status Count
semantic_review_approved 22
semantic_review_approved_with_caveat 28
Critical blockers 0

Scoring Process

The raw audit file is scored by scripts/score_ducr.py against data/ducr_scoring_rubric.json. Semantic review overrides are applied with scripts/apply_semantic_review.py, then the reviewed records are scored again.

The 2026.06 benchmark uses a 100-point DUCR rubric:

Layer Points
Discoverable 25
Understandable 25
Citable 30
Routable 20
Total 100

Full scoring detail: /ai-visibility-benchmark/ducr-score/

Reproducibility Notes

Each record stores collector_version, crawl_timestamp, crawl_user_agent, and source_html_hash. Public statistics must be traceable to dataset fields and scoring rules.

The public dataset withholds domains and URLs, but preserves anonymized site IDs and domain hashes so duplicate handling and record-level analysis remain possible without exposing audited businesses.

Official Guidance Used

The methodology aligns crawler and indexing checks with official platform documentation where available:

Source How It Informs The Benchmark
OpenAI crawler documentation Separates OAI-SearchBot, GPTBot, and ChatGPT-User roles.
Google Search Central robots.txt documentation Separates crawl control from indexing control.
Google Search Central robots meta documentation Supports noindex, nosnippet, data-nosnippet, and X-Robots-Tag checks.
Anthropic crawler documentation Separates ClaudeBot, Claude-SearchBot, and Claude-User roles.
Perplexity robots.txt documentation Supports PerplexityBot robots-access checks.

Limitations

This benchmark measures public page readiness. It does not prove AI citation outcomes, traffic volume, CRM lead quality, hidden platform trust, or platform-wide visibility.

Full limitation notes: /ai-visibility-benchmark/limitations/