SEOInformatica
SEO Informatica
SEOInformatica
SEO Informatica

AI Crawler Access Study: OAI-SearchBot, Claude, Perplexity, Bing, and Googlebot

AI-search visibility starts with access, but access is not the whole game. In the SEO Informatica 2026.06 benchmark, the reviewed 50-site sample was mostly open to search and AI-retrieval crawlers. The bigger problem was that the pages were weak sources.

2026.06 Access Findings

Field Count Percent
Primary service page returned HTTP 200 50/50 100%
Googlebot allowed 50/50 100%
Bingbot allowed 50/50 100%
OAI-SearchBot allowed 50/50 100%
GPTBot allowed 49/50 98%
ChatGPT-User allowed 49/50 98%
ClaudeBot allowed 50/50 100%
Claude-SearchBot allowed 50/50 100%
Claude-User allowed 50/50 100%
PerplexityBot allowed 50/50 100%
Perplexity-User allowed 50/50 100%
Noindex present 0/50 0%
Nosnippet present 0/50 0%

The sample does not support a lazy claim that "AI visibility is blocked by robots.txt." In this reviewed run, access was usually present. Citation readiness was not.

Access Quality Signals

Signal Count Percent
Sitemap found 48/50 96%
Key audited pages found in sitemap 6/50 12%
Service page self-canonicalized 34/50 68%
WAF block signal observed 6/50 12%
CAPTCHA signal observed 8/50 16%
JavaScript challenge signal observed 0/50 0%

The sitemap result is useful: many sites had a sitemap somewhere, but few made the audited key pages easy to confirm through sitemap evidence.

Bot Taxonomy

Agent Role
Googlebot Google Search crawling and indexing.
Bingbot Bing crawling and Microsoft search discovery.
OAI-SearchBot ChatGPT Search surfacing.
GPTBot OpenAI training crawler, tracked separately from search.
ChatGPT-User User-directed ChatGPT fetches.
ClaudeBot Anthropic training crawler.
Claude-SearchBot Claude search retrieval.
Claude-User User-directed Claude fetches.
PerplexityBot Perplexity search/indexing surfacing.
Perplexity-User User-directed Perplexity fetches.

The important principle is separation. If a business wants search and answer visibility, blocking a training crawler is a different decision from blocking a search/retrieval crawler.

Access Checks

  • robots.txt allow/block status
  • HTTP status per target page
  • WAF/CDN challenge signals
  • CAPTCHA signals
  • JavaScript challenge signals
  • key page access through sitemap and internal links
  • server-log evidence where available
  • page-level noindex and snippet restrictions
  • X-Robots-Tag status
  • canonical target clarity

Common Access Mistakes

  • Allowing Googlebot while blocking OAI-SearchBot, Claude-SearchBot, or PerplexityBot.
  • Confusing training crawler consent with search/retrieval access.
  • Using WAF rules that return 403, 429, CAPTCHA, or challenge pages to legitimate fetches.
  • Making core content visible only after client-side scripts.
  • Treating robots.txt as an indexing control instead of a crawl-control file.
  • Publishing AI visibility claims while the source page has no dataset, methodology, limitations, or download path.

Source Notes

OpenAI documents separate roles for OAI-SearchBot, GPTBot, and ChatGPT-User. Google documents that robots.txt controls crawler access and that noindex/snippet controls require crawlable pages. Anthropic documents separate ClaudeBot, Claude-SearchBot, and Claude-User agents. Perplexity documents PerplexityBot behavior for robots.txt.

Checklist Download

Access checklist: /downloads/ai-crawler-access-checklist-2026-06.csv

Related pages: