AI Search Audit: What It Is & How to Run One
An AI search audit checks whether AI engines can crawl, extract, and cite your site. The three layers, a DIY pass for each, and what audits typically find.

Max Tsygankov · Founder, Crawloria
Published June 24, 2026 · 11 min read
Intro
Search for "ai search audit" and the results split into two camps: sales pages for agencies and tools that will run one for you, and one enterprise framework so long it takes a full sitting just to read. What's missing is the middle: a clear explanation of what an AI search audit actually covers, and a version you can run yourself this afternoon without buying anything.
That gap is what this guide fills. It defines the audit, organizes every check into a three-layer model (crawl, extract, cite), walks through a hands-on pass for each layer with explicit pass criteria, and reports the failure patterns we see most often in audits run through Crawloria. By the end you will know whether your site is visible to AI search, and if it is not, you will know which layer is broken.
What is an AI search audit?
An AI search audit is a structured check of whether AI search engines and assistants (ChatGPT, Perplexity, Google AI Overviews, Gemini, Claude) can discover your website, parse its content, and cite or recommend it in their answers. The output is a list of failures ranked by severity, each tied to a fix.
It differs from a traditional SEO audit in what it tests. An SEO audit asks whether Google can index your pages and how they rank against competitors for keywords. An AI search audit asks whether a different set of crawlers can reach you, whether your content survives conversion into the plain text and structured data that language models consume, and whether your brand actually appears when an AI answers questions in your category. There is overlap (a crawlable, well-structured site helps both), but passing an SEO audit tells you surprisingly little about AI visibility. A site can rank on page one of Google while blocking every AI crawler at the firewall, and in our audit work we see exactly that: page-one sites shut out of AI search without knowing it.
The traffic at stake is no longer hypothetical. On our own site, ChatGPT is the second-largest traffic referrer after Google (GA4, June 2026), and we are a four-month-old tool site, not a media property.
The three layers: crawl, extract, cite
Every check in an AI search audit belongs to one of three layers, and the layers are sequential. A failure at layer 1 makes layers 2 and 3 irrelevant; there is no point polishing schema markup on pages no AI crawler can reach.
| Layer | Question | Typical failure | Symptom |
|---|---|---|---|
| 1. Crawl | Can AI bots reach your pages? | Firewall or bot-management rules blocking AI crawlers | Absent from AI answers entirely |
| 2. Extract | Can AI systems parse what they fetch? | Content rendered only by JavaScript; missing or broken schema | Crawled but misrepresented or skipped |
| 3. Cite | Do AI engines reference you? | Weak entity signals; no third-party corroboration | Parsed but never recommended |
The order also sets the audit's priority logic: fix downward failures first. The sections below run one layer at a time, and each check follows the same format: what to check, how to check it by hand, and what counts as a pass.
Layer 1: Can AI crawlers reach your site?
This layer fails more often than site owners expect, usually through security defaults nobody reviewed.
Check 1.1 — robots.txt rules per bot. Open yourdomain.com/robots.txt and read it against the AI crawlers that matter: GPTBot and OAI-SearchBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), Google-Extended. Look for Disallow: / under any of their user-agent blocks, and check whether a blanket User-agent: * disallow is catching them unintentionally. Our guide to the four classes of AI bots explains which bot does what and why you might allow some and block others. Pass: every AI crawler you want traffic from can access your key pages, and any blocks that exist are deliberate decisions.
Check 1.2 — edge security and bot management. robots.txt is a request; your CDN or firewall is an enforcement layer that can override your intentions. Fetch a page pretending to be an AI crawler and look at the HTTP status:
curl -I -A "GPTBot" https://yourdomain.com/
A 403, a CAPTCHA challenge page, or an endless redirect means your edge layer blocks the bot regardless of what robots.txt says. Cloudflare's bot-fight settings and similar products on other CDNs ship with defaults that treat AI crawlers as threats. Pass: HTTP 200 with real page content for each AI user agent you intend to allow.
Check 1.3 — server logs. Status codes tell you bots can visit; logs tell you whether they do. Grep a recent log window for the AI user agents above and note which ones appear and which pages they hit. No PerplexityBot visits in 30 days on a content-rich site suggests a reachability or discovery problem upstream. Pass: the crawlers you allow show up in your logs over a normal month.
Layer 2: Can AI systems extract your content?
Reaching the page is not the same as reading it. AI pipelines convert pages into plain text and structured data, and that conversion is far less forgiving than a browser.
Check 2.1 — content without JavaScript. AI crawlers generally fetch the raw HTML response and, per vendor documentation and observed behavior, do not reliably execute JavaScript the way Googlebot does. View your page source (not DevTools' rendered view: the actual view-source:) and search for your main content. If the body text, product details, or pricing only appear after client-side rendering, AI systems get an empty shell. Pass: your substantive content is present in the initial HTML response.
Check 2.2 — structured data validity. Run your key pages through validator.schema.org. For articles, check Article schema; for products, check that Product schema carries complete attributes (price, availability, identifiers) matching the visible page. Schema that contradicts the page, or describes the wrong entity type, is worse than none. Pass: valid schema of the correct type on every key template, with no errors and no drift from visible content.
Check 2.3 — heading structure and answerability. AI retrieval works at the passage level: engines pull sections, not whole pages. Scan each important page for one H1, descriptive H2s that read as questions or claims, and a direct answer in the first sentence under each heading. A page that buries its answers in paragraph five of an unbroken text wall extracts poorly. Pass: a reader (or a parser) can lift any section and have it make sense standing alone.
Check 2.4 — page weight and response time. Crawlers operate on budgets. Multi-megabyte HTML documents and slow server responses get truncated or abandoned. Check your key pages' HTML size and time-to-first-byte; if the document is bloated with inlined assets, the content an AI needs may sit beyond what it bothers to read. Pass: lean HTML documents and consistently fast responses on the pages that matter.
Layer 3: Do AI engines actually cite you?
The first two layers are mechanical; this one measures the outcome. It needs a test set, not a one-off question.
Check 3.1 — prompt-set spot check. Write 10-15 questions a real customer would ask in your category: branded ("what is [brand]?"), category-level ("best [category] for [use case]"), and comparison questions. Run them logged-out in ChatGPT, Perplexity, and Google's AI Mode, and record whether your brand is mentioned, whether your pages are cited as sources, and who appears instead of you. Pass: you appear in the majority of branded answers and at least some category answers; competitors do not monopolize your core comparisons.
Check 3.2 — brand entity check. Ask each engine directly: "What is [your brand]?" Wrong descriptions, confusion with a similarly named company, or visible staleness signal weak entity grounding — usually a thin or inconsistent footprint across the third-party sources engines lean on. Pass: engines describe your business accurately without prompting hints.
Check 3.3 — citation source review. When you do get cited, note which of your URLs carry the citations. If answers consistently cite a stale blog post instead of your current product page, you have an internal authority problem worth fixing deliberately. Pass: the pages you want representing you are the ones engines pick.
Failures at this layer with clean results on layers 1 and 2 mean the problem is content and reputation rather than infrastructure. The fixes live in a different discipline; our guide to techniques for boosting visibility in AI search covers that ground, and the ChatGPT-specific optimization guide goes deeper on one engine.
What AI search audits typically find
Since Crawloria's public launch in February 2026 we have watched the same failure patterns recur across the audits run through the tool. Three observations are worth pre-loading before you run your own pass.
First, the layer-1 problem is usually invisible to its owner. Sites blocking AI crawlers at the edge almost never did it on purpose; a security product's default setting did it for them, and nothing in their analytics flags it, because blocked bots leave no trace in GA4. In our audit work, edge-level blocking and JavaScript-only content are the two findings that surprise site owners most.
Second, audits get abandoned when they are slow or manual. Roughly a third of the audits started on our own tool never run to completion (internal instrumentation, June 2026) — and that is a tool that automates the work. A manual audit competing with your actual job has worse odds, which is an argument for doing a focused two-hour pass with explicit pass criteria rather than an open-ended investigation.
Third, layer-3 disappointment usually traces back to layers 1 and 2 anyway. The temptation is to start with the interesting question ("why doesn't ChatGPT recommend us?") and skip the plumbing. If that question is yours, our walkthrough of why ChatGPT isn't showing your website runs the same diagnosis symptom-first.
DIY audit vs automated audit
The honest scoping: the manual pass above takes two to three hours for a typical site and covers all three layers, with layer 3 requiring repeat runs over several weeks before trends mean anything. It costs nothing and teaches you how your site looks to a machine, which has value on its own.
The automated route covers the mechanical layers faster. Crawloria's free audit runs the crawl and extract checks (bot access across the major AI crawlers, rendering, schema, structure) in a few minutes and returns a scored report with the failures ranked. It does not replace the layer-3 prompt work, because no crawler can tell you what ChatGPT says about your brand; pair the automated report with the prompt-set check from 3.1 and you have the full picture in under an hour.
Use the DIY version when you want depth and understanding; use the automated version when you want the mechanical layers verified now and your time spent on the citation layer instead.
Where to start
- Run the 60-second versions of checks 1.1 and 1.2 (robots.txt read + one curl per AI user agent). These catch the most damaging failures.
- View-source your top three pages and confirm the content is in the HTML (check 2.1).
- Validate schema on one page per template type (check 2.2).
- Write your 10-15 prompt set and run it once across ChatGPT and Perplexity (check 3.1) to set a baseline.
- Automate the mechanical layers with a free Crawloria audit and compare its findings against your manual pass.
- Re-run the prompt set monthly. The infrastructure checks are mostly one-time; the citation layer is the one that moves.
FAQ
How long does an AI search audit take?
The manual version in this guide takes two to three hours for the crawl and extract layers, plus 30-45 minutes for the first citation pass. Automated tools compress the mechanical layers to minutes. The citation layer needs repeated runs over weeks to show reliable trends, so treat the first pass as a baseline rather than a verdict.
Is an AI search audit different from an SEO audit?
Yes. An SEO audit tests indexability and ranking factors for traditional search engines. An AI search audit tests a different bot population (GPTBot, PerplexityBot, ClaudeBot), stricter extraction conditions (no JavaScript rendering, passage-level retrieval), and a different outcome: citations in generated answers rather than positions on a results page. Passing one does not imply passing the other.
How often should I repeat the audit?
Infrastructure layers: after any platform migration, CDN or security change, or redesign, and otherwise quarterly. Citation layer: monthly, with the same prompt set, so changes are comparable. A security product update silently blocking AI crawlers is the classic regression worth catching early.
Can I run an AI search audit for free?
Yes. Every manual check in this guide uses free tooling: your browser, curl, validator.schema.org, and logged-out sessions in the AI engines themselves. Crawloria's audit automates the crawl and extract layers free as well. Paid tools mostly add continuous monitoring and citation tracking at scale on top of these same fundamentals.