AI Crawler Access Study: OAI-SearchBot, Claude, Perplexity, Bing, and Googlebot

AI-search visibility starts with access, but access is not the whole game. In the SEO Informatica 2026.06 benchmark, the reviewed 50-site sample was mostly open to search and AI-retrieval crawlers. The bigger problem was that the pages were weak sources.

2026.06 Access Findings

Field	Count	Percent
Primary service page returned HTTP 200	50/50	100%
Googlebot allowed	50/50	100%
Bingbot allowed	50/50	100%
OAI-SearchBot allowed	50/50	100%
GPTBot allowed	49/50	98%
ChatGPT-User allowed	49/50	98%
ClaudeBot allowed	50/50	100%
Claude-SearchBot allowed	50/50	100%
Claude-User allowed	50/50	100%
PerplexityBot allowed	50/50	100%
Perplexity-User allowed	50/50	100%
Noindex present	0/50	0%
Nosnippet present	0/50	0%

The sample does not support a lazy claim that "AI visibility is blocked by robots.txt." In this reviewed run, access was usually present. Citation readiness was not.

Access Quality Signals

Signal	Count	Percent
Sitemap found	48/50	96%
Key audited pages found in sitemap	6/50	12%
Service page self-canonicalized	34/50	68%
WAF block signal observed	6/50	12%
CAPTCHA signal observed	8/50	16%
JavaScript challenge signal observed	0/50	0%

The sitemap result is useful: many sites had a sitemap somewhere, but few made the audited key pages easy to confirm through sitemap evidence.

Bot Taxonomy

Agent	Role
Googlebot	Google Search crawling and indexing.
Bingbot	Bing crawling and Microsoft search discovery.
OAI-SearchBot	ChatGPT Search surfacing.
GPTBot	OpenAI training crawler, tracked separately from search.
ChatGPT-User	User-directed ChatGPT fetches.
ClaudeBot	Anthropic training crawler.
Claude-SearchBot	Claude search retrieval.
Claude-User	User-directed Claude fetches.
PerplexityBot	Perplexity search/indexing surfacing.
Perplexity-User	User-directed Perplexity fetches.

The important principle is separation. If a business wants search and answer visibility, blocking a training crawler is a different decision from blocking a search/retrieval crawler.

Access Checks

robots.txt allow/block status
HTTP status per target page
WAF/CDN challenge signals
CAPTCHA signals
JavaScript challenge signals
key page access through sitemap and internal links
server-log evidence where available
page-level noindex and snippet restrictions
X-Robots-Tag status
canonical target clarity

Common Access Mistakes

Allowing Googlebot while blocking OAI-SearchBot, Claude-SearchBot, or PerplexityBot.
Confusing training crawler consent with search/retrieval access.
Using WAF rules that return 403, 429, CAPTCHA, or challenge pages to legitimate fetches.
Making core content visible only after client-side scripts.
Treating robots.txt as an indexing control instead of a crawl-control file.
Publishing AI visibility claims while the source page has no dataset, methodology, limitations, or download path.

Source Notes

OpenAI documents separate roles for OAI-SearchBot, GPTBot, and ChatGPT-User. Google documents that robots.txt controls crawler access and that noindex/snippet controls require crawlable pages. Anthropic documents separate ClaudeBot, Claude-SearchBot, and Claude-User agents. Perplexity documents PerplexityBot behavior for robots.txt.

Checklist Download

Access checklist: /downloads/ai-crawler-access-checklist-2026-06.csv