This page hosts the anonymized reviewed dataset for the SEO Informatica AI Visibility Benchmark 2026.06.
The dataset is designed to support benchmark claims about public service-business page readiness. It is not designed to expose the audited domains, publish private company findings, or claim market-wide averages.
Dataset Summary
| Field | Value |
|---|---|
| Dataset name | SEO Informatica AI Visibility Benchmark Dataset |
| Version | 2026.06 |
| Run ID | semantic-reviewed-50-2026-06-04 |
| Sample size | 50 reviewed records |
| Unique anonymized domains | 50 |
| Collection period | June 3, 2026 UTC / June 4, 2026 IST |
| Domain publication | Domains and URLs withheld; all records use domain_public=false |
| Proposed public license | Attribution required to SEO Informatica; confirm final license before live publication |
| Reviewed CSV download | /downloads/ai-visibility-benchmark-2026-06-reviewed.csv |
| Reviewed JSON download | /downloads/ai-visibility-benchmark-2026-06-reviewed.json |
| Codebook | /downloads/ai-visibility-benchmark-codebook-2026-06.md |
| Rubric | /downloads/ducr-scoring-rubric-2026-06.json |
What Is Public
The public dataset contains anonymized site IDs, domain hashes, verticals, page-quality fields, crawler-access fields, index/snippet fields, schema fields, DUCR scores, review status, and provenance fields.
The public dataset does not expose live domains, live URLs, private analytics, CRM data, server logs, client names, or owner-identifying review notes.
Field Groups
- Site and sample metadata
- Audited page URL fields with public values withheld
- Crawl access fields
- Index and snippet fields
- Page-structure fields
- Entity and source-clarity fields
- Schema fields
- DUCR scoring fields
- Manual semantic review status
- Provenance fields
Review Status
| Review Status | Count |
|---|---|
semantic_review_approved |
22 |
semantic_review_approved_with_caveat |
28 |
Rows marked with caveats were still accepted into the benchmark after semantic review, but the caveat label should remain visible so future readers understand that the sample was manually checked, not blindly accepted from automation.
Downloads
- Reviewed CSV: /downloads/ai-visibility-benchmark-2026-06-reviewed.csv
- Reviewed JSON: /downloads/ai-visibility-benchmark-2026-06-reviewed.json
- Summary JSON: /downloads/ai-visibility-benchmark-summary-2026-06.json
- Benchmark stats JSON: /downloads/benchmark-stats-2026-06.json
- Codebook: /downloads/ai-visibility-benchmark-codebook-2026-06.md
- DUCR rubric JSON: /downloads/ducr-scoring-rubric-2026-06.json
- Semantic review overrides: /downloads/semantic-review-overrides-2026-06-04.csv
- SHA256 checksums: /downloads/SHA256SUMS.txt
How To Cite The Dataset
Use the dataset version and date, not a vague page title.
Recommended citation format:
SEO Informatica. "Service-Business AI Visibility Benchmark Dataset." Version 2026.06. Collected June 3, 2026 UTC / June 4, 2026 IST. Reviewed anonymized sample of 50 service-business websites.
Known Gaps
This dataset uses public-page audits. It cannot measure private analytics, CRM outcomes, hidden platform behavior, personalized AI answers, or guaranteed citation probability.
The sample is also vertically uneven. Consulting, accounting, and agency sites represent 35 of 50 rows, so vertical-level comparisons should be treated as descriptive unless the count is high enough to support stronger analysis.