LLM Content Visibility Scanner

See your website through the eyes of an AI

The crawlers that feed LLMs like ChatGPT, Perplexity, and Claude — GPTBot, ClaudeBot, PerplexityBot — don’t run JavaScript when they fetch web pages. They only see the raw HTML your server sends back. Content that depends on JavaScript to render is invisible to most AI crawlers (Google being the main exception — Googlebot renders JS, so Google’s AI Overviews and Gemini generally see more than the others). This tool scans your page’s raw HTML and identifies content gaps, missing metadata, and rendering issues that prevent LLM crawlers from seeing your content.

Heads up: it’s a fast triage tool, not a definitive audit — treat scores as directional. It can’t detect cloaking, some sites block the proxy and return stripped responses, and hybrid pages with skeletal SSR can score higher than they deserve. For deeper analysis, pair it with Search Console URL Inspection.

How This Works

This tool fetches the raw HTML response from your URL — the same HTML that LLM crawlers (ChatGPT, Perplexity, Claude, Google AI Overviews) receive. It then analyzes the HTML for:

  • Client-side rendering signals — empty root divs, SPA frameworks without SSR, minimal body content
  • JavaScript-dependent content patterns — lazy-loaded elements, dynamically-inserted content, client-side routing
  • Missing metadata — title tags, meta descriptions, Open Graph tags, structured data (JSON-LD)
  • Content accessibility — heading structure, image alt text, link crawlability, text-to-code ratio
  • Technical SEO signals — canonical tags, robots meta, hreflang, mobile viewport

Unlike Googlebot, LLMs do not render JavaScript. Content loaded via React, Vue, Angular, or AJAX calls after the initial page load is completely invisible to AI systems. For maximum AI visibility, all important content should be present in the initial server response.

Limitations & accuracy notes

This scanner is most accurate for two cases:

  • Fully client-rendered pages (empty SPA shells with no real content) — reliably flagged as critical
  • Substantial server-rendered pages (blogs, docs, articles, Wikipedia-style content) — reliably scored high

It has known limitations on the middle band:

  • Hybrid pages (e.g., marketing sites with skeletal SSR + JS-loaded product details) may score artificially high if the page ships some hero copy in raw HTML but the substantive content is JS-rendered. A static-HTML scanner cannot distinguish “thin marketing copy” from “complete content.”
  • Cloaking detection (sites that serve different HTML to GPTBot vs. regular browsers) is not possible from a browser-only tool. Use Google Search Console’s URL Inspection to verify what bots actually receive.
  • Paywall detection covers common implementations (Piano, Tinypass, Zephr, Pico, Schema.org markup, NYT/Atlantic/Economist patterns) but bespoke or JS-rendered paywalls may not be detected.
  • Sites with bot protection — some sites detect and block automated fetches, returning a stripped page (no meta description, no structured data, minimal content) to proxy IPs while serving the full page to real browsers. If the scanner reports a near-empty page for a site you know has substantial content, this is likely what’s happening, and the scanner will surface a warning banner. To verify what real browsers see, view the page source directly in Chrome (Cmd/Ctrl + Opt + U) and compare.

For comprehensive analysis comparing raw HTML against fully rendered output (which catches the middle-band cases), a headless-browser approach is required. The scanner here is designed as a fast triage tool — treat scores as directional indicators of LLM visibility, not absolute measurements.