Reference

How we score sites.

Crawloria scores a site from 0 to 100 based on six weighted categories. This page explains exactly what each category measures, why it's weighted the way it is, and how the overall score and letter grade are computed.

The formula

Every individual check returns a sub-score from 0 to 10. Each category averages its checks. The overall score is a weighted sum across categories, normalized to the implemented weights and scaled to 0-100.

CategoryWeight
AI Bot Access25%
Content Accessibility20%
Structured Data15%
Navigation Friction15%
Agent-Specific Signals15%
Semantic Markup10%
Total100%

When a category can't run (for example, the headless browser couldn't render the page), we exclude its weight from the denominator rather than score it as zero. The overall score reflects only the categories we successfully measured, with a “pending” indicator on the audit page for everything else.

Letter grade bands

A
90+
B
80+
C
70+
D
60+
F
<60

AI Bot Access

25% weight

Whether the seven major AI crawlers can actually fetch the site. We send real GET requests with each agent's documented User-Agent string and record what comes back.

  • GPTBot

    OpenAI's general crawler. Powers ChatGPT search and feeds future model training.

  • ClaudeBot

    Anthropic's crawler powering Claude's web access and search results.

  • OAI-SearchBot

    OpenAI's search-specific crawler — separate from GPTBot, often blocked by sites that thought they only needed to allow one.

  • PerplexityBot

    Perplexity's crawler that surfaces pages in their AI search results.

  • CCBot

    Common Crawl. Feeds many open AI training datasets and downstream agents.

  • anthropic-ai

    Used by Anthropic agents (including Claude Computer Use) when fetching pages on a user's behalf.

  • Google-Extended

    Google's separate crawler for AI Overviews and Gemini training. Distinct from Googlebot.

Content Accessibility

20% weight

Whether content actually loads in a form agents can use. HTTPS, response time, JavaScript dependency, and presence of bot-protection layers that may silently block traffic.

  • HTTPS

    Plain HTTP is heavily penalized by modern agents. Most refuse to submit forms or follow links over insecure connections.

  • Time to First Byte

    Agents have shorter timeouts than humans. Slow first byte often causes them to abort before reading anything.

  • Bot protection layer

    Cloudflare's "Block AI Bots" toggle is on by default for many plans. We detect Cloudflare via response headers and warn even if scoring otherwise looks fine.

  • Content available without JavaScript

    We fetch the initial HTML, then render the page in a real browser, and compare. Pages where most content arrives through client-side rendering score low — many AI agents don't run JS or have limited JS support. A guard rail also flags pages with under 200 chars of rendered text (login walls, error pages, auth-gated SPAs).

Structured Data

15% weight

Schema.org JSON-LD, Open Graph, canonical URLs, title and meta description. The signals that let agents understand what a page is, not just read it.

  • JSON-LD structured data

    Schema.org JSON-LD blocks tell agents the page is an Organization, Article, Product, FAQPage, etc. Heavily weighted because it's the most direct way to feed agents structured information.

  • Open Graph metadata

    OG tags drive previews when URLs are shared in chat clients, agents, and social platforms. og:title, og:description, og:image, og:url at minimum.

  • Canonical URL

    Without a canonical link, agents may treat URL variants (with/without trailing slash, query params) as different pages and fragment authority.

  • Title and meta description

    Still the primary signal for what a page is about. Empty or default titles cost points.

  • LocalBusiness / Product schema

    When relevant, we also check the type-specific schema. LocalBusiness must include name, address, telephone, opening hours, geo, priceRange. Product must include name, image, offers (with price), availability, brand, aggregateRating, sku.

Navigation Friction

15% weight

Things that block agents from reaching real content even after they've fetched the page. Cookie banners, modals, login walls.

  • Modal or banner blocking content

    We render the page in a real browser at 1568×1024 viewport and check for known cookie consent overlays (OneTrust, Cookiebot, Iubenda, Termly) plus a fallback heuristic that scans fixed/sticky elements with high z-index whose text matches cookie/consent/sign-up/subscribe/register/newsletter. Banners covering more than 50% of the viewport get a fail score.

Agent-Specific Signals

15% weight

Files specifically created to help AI agents understand and navigate the site, plus real-time search visibility for the brand's category.

  • robots.txt allows AI bots

    We fetch /robots.txt and parse it for User-agent declarations that disallow our seven major AI crawlers. A site can serve content with HTTP 200 but still be hostile in robots.txt.

  • Sitemap declared in robots.txt

    Agents and crawlers use the Sitemap directive to find your full URL list. Missing this means agents must guess what pages exist.

  • llms.txt present and well-formed

    llms.txt is an emerging standard (proposed by Jeremy Howard at llmstxt.org) that gives LLMs a curated, structured introduction to a site in markdown. We check both /llms.txt and /.well-known/llms.txt and validate the structure.

  • Real-time search visibility

    When a Brave Search API key is configured, we run a live search query for the brand's narrow category and check whether the domain appears in the top 20 results. This is the closest measurement to what ChatGPT Search will surface.

Semantic Markup

10% weight

Basic HTML semantics that help agents (and screen readers, and search engines) interpret page structure.

  • Heading hierarchy

    One <h1> per page, then nested <h2>s, then <h3>s. Multiple H1s or skipped levels (H1 → H3) make the page outline ambiguous.

  • HTML lang attribute

    Without a lang attribute on the <html> element, agents and translation tools have to guess the language from content.

  • Viewport meta tag

    Mobile agents and Computer Use models render at specific viewports. Without this they fall back to desktop assumptions and may misread layout.

  • Image alt text coverage

    Visual agents downsample heavily; alt text is often the only signal for what an image conveys. We measure the percentage of <img> tags with non-empty alt attributes.

What we don't measure

A few things you might expect to see but won't in V0:

  • Multi-page audits. We only scan the URL you give us. Auditing the full site is a planned Pro feature.
  • Form analysis (label association, autocomplete attributes, input types). Critical for agents that fill forms; on the roadmap.
  • Real Claude Computer Use replay against your site. Expensive and slow for a free tier; this is a paid-tier feature on the roadmap.
  • Industry benchmarks. We don't yet compare your score against similar sites in your category — that requires a much larger dataset of audits.

Disclaimer

Crawloria is an automated audit. Scores are computed from measurements taken at scan time and reflect what a fresh, US-based, unauthenticated request sees. Real users, agents in other regions, authenticated sessions, and crawlers operating at scale may experience the site differently. We don't represent that any score predicts business outcomes — it's a structural diagnostic, not a ranking.