AI-search visibility starts with access, but access is not the whole game. In the SEO Informatica 2026.06 benchmark, the reviewed 50-site sample was mostly open to search and AI-retrieval crawlers. The bigger problem was that the pages were weak sources.
2026.06 Access Findings
| Field | Count | Percent |
|---|---|---|
| Primary service page returned HTTP 200 | 50/50 | 100% |
| Googlebot allowed | 50/50 | 100% |
| Bingbot allowed | 50/50 | 100% |
| OAI-SearchBot allowed | 50/50 | 100% |
| GPTBot allowed | 49/50 | 98% |
| ChatGPT-User allowed | 49/50 | 98% |
| ClaudeBot allowed | 50/50 | 100% |
| Claude-SearchBot allowed | 50/50 | 100% |
| Claude-User allowed | 50/50 | 100% |
| PerplexityBot allowed | 50/50 | 100% |
| Perplexity-User allowed | 50/50 | 100% |
| Noindex present | 0/50 | 0% |
| Nosnippet present | 0/50 | 0% |
The sample does not support a lazy claim that "AI visibility is blocked by robots.txt." In this reviewed run, access was usually present. Citation readiness was not.
Access Quality Signals
| Signal | Count | Percent |
|---|---|---|
| Sitemap found | 48/50 | 96% |
| Key audited pages found in sitemap | 6/50 | 12% |
| Service page self-canonicalized | 34/50 | 68% |
| WAF block signal observed | 6/50 | 12% |
| CAPTCHA signal observed | 8/50 | 16% |
| JavaScript challenge signal observed | 0/50 | 0% |
The sitemap result is useful: many sites had a sitemap somewhere, but few made the audited key pages easy to confirm through sitemap evidence.
Bot Taxonomy
| Agent | Role |
|---|---|
| Googlebot | Google Search crawling and indexing. |
| Bingbot | Bing crawling and Microsoft search discovery. |
| OAI-SearchBot | ChatGPT Search surfacing. |
| GPTBot | OpenAI training crawler, tracked separately from search. |
| ChatGPT-User | User-directed ChatGPT fetches. |
| ClaudeBot | Anthropic training crawler. |
| Claude-SearchBot | Claude search retrieval. |
| Claude-User | User-directed Claude fetches. |
| PerplexityBot | Perplexity search/indexing surfacing. |
| Perplexity-User | User-directed Perplexity fetches. |
The important principle is separation. If a business wants search and answer visibility, blocking a training crawler is a different decision from blocking a search/retrieval crawler.
Access Checks
- robots.txt allow/block status
- HTTP status per target page
- WAF/CDN challenge signals
- CAPTCHA signals
- JavaScript challenge signals
- key page access through sitemap and internal links
- server-log evidence where available
- page-level noindex and snippet restrictions
- X-Robots-Tag status
- canonical target clarity
Common Access Mistakes
- Allowing Googlebot while blocking OAI-SearchBot, Claude-SearchBot, or PerplexityBot.
- Confusing training crawler consent with search/retrieval access.
- Using WAF rules that return 403, 429, CAPTCHA, or challenge pages to legitimate fetches.
- Making core content visible only after client-side scripts.
- Treating robots.txt as an indexing control instead of a crawl-control file.
- Publishing AI visibility claims while the source page has no dataset, methodology, limitations, or download path.
Source Notes
OpenAI documents separate roles for OAI-SearchBot, GPTBot, and ChatGPT-User. Google documents that robots.txt controls crawler access and that noindex/snippet controls require crawlable pages. Anthropic documents separate ClaudeBot, Claude-SearchBot, and Claude-User agents. Perplexity documents PerplexityBot behavior for robots.txt.
Checklist Download
Access checklist: /downloads/ai-crawler-access-checklist-2026-06.csv
Related pages: