This page hosts the anonymized reviewed dataset for the SEO Informatica AI Visibility Benchmark 2026.06.

The dataset is designed to support benchmark claims about public service-business page readiness. It is not designed to expose the audited domains, publish private company findings, or claim market-wide averages.

Dataset Summary

Field	Value
Dataset name	SEO Informatica AI Visibility Benchmark Dataset
Version	2026.06
Run ID	`semantic-reviewed-50-2026-06-04`
Sample size	50 reviewed records
Unique anonymized domains	50
Collection period	June 3, 2026 UTC / June 4, 2026 IST
Domain publication	Domains and URLs withheld; all records use `domain_public=false`
Proposed public license	Attribution required to SEO Informatica; confirm final license before live publication
Reviewed CSV download	/downloads/ai-visibility-benchmark-2026-06-reviewed.csv
Reviewed JSON download	/downloads/ai-visibility-benchmark-2026-06-reviewed.json
Codebook	/downloads/ai-visibility-benchmark-codebook-2026-06.md
Rubric	/downloads/ducr-scoring-rubric-2026-06.json

What Is Public

The public dataset contains anonymized site IDs, domain hashes, verticals, page-quality fields, crawler-access fields, index/snippet fields, schema fields, DUCR scores, review status, and provenance fields.

The public dataset does not expose live domains, live URLs, private analytics, CRM data, server logs, client names, or owner-identifying review notes.

Field Groups

Site and sample metadata
Audited page URL fields with public values withheld
Crawl access fields
Index and snippet fields
Page-structure fields
Entity and source-clarity fields
Schema fields
DUCR scoring fields
Manual semantic review status
Provenance fields

Review Status

Review Status	Count
`semantic_review_approved`	22
`semantic_review_approved_with_caveat`	28

Rows marked with caveats were still accepted into the benchmark after semantic review, but the caveat label should remain visible so future readers understand that the sample was manually checked, not blindly accepted from automation.

Downloads

Reviewed CSV: /downloads/ai-visibility-benchmark-2026-06-reviewed.csv
Reviewed JSON: /downloads/ai-visibility-benchmark-2026-06-reviewed.json
Summary JSON: /downloads/ai-visibility-benchmark-summary-2026-06.json
Benchmark stats JSON: /downloads/benchmark-stats-2026-06.json
Codebook: /downloads/ai-visibility-benchmark-codebook-2026-06.md
DUCR rubric JSON: /downloads/ducr-scoring-rubric-2026-06.json
Semantic review overrides: /downloads/semantic-review-overrides-2026-06-04.csv
SHA256 checksums: /downloads/SHA256SUMS.txt

How To Cite The Dataset

Use the dataset version and date, not a vague page title.

Recommended citation format:

SEO Informatica. "Service-Business AI Visibility Benchmark Dataset." Version 2026.06. Collected June 3, 2026 UTC / June 4, 2026 IST. Reviewed anonymized sample of 50 service-business websites.

Known Gaps

This dataset uses public-page audits. It cannot measure private analytics, CRM outcomes, hidden platform behavior, personalized AI answers, or guaranteed citation probability.

The sample is also vertically uneven. Consulting, accounting, and agency sites represent 35 of 50 rows, so vertical-level comparisons should be treated as descriptive unless the count is high enough to support stronger analysis.