This methodology explains how SEO Informatica audits service-business websites for DUCR readiness: Discoverable, Understandable, Citable, and Routable.
The 2026.06 benchmark uses a reviewed anonymized sample of 50 service-business websites. Automated crawl fields were collected first, then rows were manually reviewed for semantic fit, source quality, and obvious automation errors before final scoring.
Sample Criteria
A site qualifies if it sells a service, has at least one service page and one conversion route, exposes indexable public HTML, targets customers through search or local discovery, and can be audited without admin access.
The 2026.06 sample includes 50 unique anonymized domains across consulting, accounting, agency, home services, legal, dental, pest control, junk removal, roofing, wellness, and med spa categories.
Exclusion Rules
Exclude parked, login-only, under-construction, PDF-only, app-only, non-service, adult, gambling, weapon, illegal, duplicate, inaccessible, or owner-excluded websites.
During semantic review, records are removed or replaced if the reviewed page is not a service-business page, if the browser-rendered page is blocked or misleading, if the row appears duplicated, or if crawler output cannot support the scoring fields.
Pages Audited Per Site
| Page Type | Required |
|---|---|
| Homepage | Yes |
| Primary service page | Yes |
| Supporting informational page | Preferred |
| About page | Preferred |
| Contact or lead route | Yes |
| Location page | Optional |
Automated Checks
The crawler checks status codes, robots rules, sitemap presence, canonicals, meta robots, snippet controls, headings, schema types, visible text, internal links, lists, tables, download links, WAF signals, CAPTCHA signals, and JavaScript challenge signals.
For sitemap checks, the crawler limits sitemap exploration so one large site cannot dominate the run. The 2026.06 crawler used bounded sitemap discovery with a maximum of eight sitemap candidates, 20,000 sitemap URLs parsed, and an eight-second sitemap timeout per request.
Bot Access Checks
The run checks separate crawler/user-agent fields for Googlebot, Bingbot, OAI-SearchBot, GPTBot, ChatGPT-User, ClaudeBot, Claude-SearchBot, Claude-User, PerplexityBot, and Perplexity-User.
These are intentionally separated. Training crawlers, search/indexing crawlers, and user-directed fetchers are not the same thing. Treating them as one "AI bot" bucket would produce bad recommendations.
Manual Semantic Review
The reviewer checks page role, target audience, service/entity clarity, proof, author/date/reviewer visibility, methodology/process explanation, limitations, CTA route, source support, claim restraint, and whether the page looks like a valid service-business page in a browser.
The final 2026.06 dataset contains:
| Manual Review Status | Count |
|---|---|
semantic_review_approved |
22 |
semantic_review_approved_with_caveat |
28 |
| Critical blockers | 0 |
Scoring Process
The raw audit file is scored by scripts/score_ducr.py against data/ducr_scoring_rubric.json. Semantic review overrides are applied with scripts/apply_semantic_review.py, then the reviewed records are scored again.
The 2026.06 benchmark uses a 100-point DUCR rubric:
| Layer | Points |
|---|---|
| Discoverable | 25 |
| Understandable | 25 |
| Citable | 30 |
| Routable | 20 |
| Total | 100 |
Full scoring detail: /ai-visibility-benchmark/ducr-score/
Reproducibility Notes
Each record stores collector_version, crawl_timestamp, crawl_user_agent, and source_html_hash. Public statistics must be traceable to dataset fields and scoring rules.
The public dataset withholds domains and URLs, but preserves anonymized site IDs and domain hashes so duplicate handling and record-level analysis remain possible without exposing audited businesses.
Official Guidance Used
The methodology aligns crawler and indexing checks with official platform documentation where available:
| Source | How It Informs The Benchmark |
|---|---|
| OpenAI crawler documentation | Separates OAI-SearchBot, GPTBot, and ChatGPT-User roles. |
| Google Search Central robots.txt documentation | Separates crawl control from indexing control. |
| Google Search Central robots meta documentation | Supports noindex, nosnippet, data-nosnippet, and X-Robots-Tag checks. |
| Anthropic crawler documentation | Separates ClaudeBot, Claude-SearchBot, and Claude-User roles. |
| Perplexity robots.txt documentation | Supports PerplexityBot robots-access checks. |
Limitations
This benchmark measures public page readiness. It does not prove AI citation outcomes, traffic volume, CRM lead quality, hidden platform trust, or platform-wide visibility.
Full limitation notes: /ai-visibility-benchmark/limitations/