LLM Content Visibility Scanner

See your website through the eyes of an AI

LLMs like ChatGPT, Perplexity, and Claude don’t run JavaScript when they crawl web pages. They only see the raw HTML your server sends back. Content that depends on JavaScript to render is completely invisible to AI systems. This tool scans your page’s raw HTML and identifies content gaps, missing metadata, and rendering issues that prevent LLMs from seeing your content.

How This Works

This tool fetches the raw HTML response from your URL — the same HTML that LLM crawlers (ChatGPT, Perplexity, Claude, Google AI Overviews) receive. It then analyzes the HTML for:

  • Client-side rendering signals — empty root divs, SPA frameworks without SSR, minimal body content
  • JavaScript-dependent content patterns — lazy-loaded elements, dynamically-inserted content, client-side routing
  • Missing metadata — title tags, meta descriptions, Open Graph tags, structured data (JSON-LD)
  • Content accessibility — heading structure, image alt text, link crawlability, text-to-code ratio
  • Technical SEO signals — canonical tags, robots meta, hreflang, mobile viewport

Unlike Googlebot, LLMs do not render JavaScript. Content loaded via React, Vue, Angular, or AJAX calls after the initial page load is completely invisible to AI systems. For maximum AI visibility, all important content should be present in the initial server response.

Limitations & accuracy notes

This scanner is most accurate for two cases:

  • Fully client-rendered pages (empty SPA shells with no real content) — reliably flagged as critical
  • Substantial server-rendered pages (blogs, docs, articles, Wikipedia-style content) — reliably scored high

It has known limitations on the middle band:

  • Hybrid pages (e.g., marketing sites with skeletal SSR + JS-loaded product details) may score artificially high if the page ships some hero copy in raw HTML but the substantive content is JS-rendered. A static-HTML scanner cannot distinguish “thin marketing copy” from “complete content.”
  • Cloaking detection (sites that serve different HTML to GPTBot vs. regular browsers) is not possible from a browser-only tool. Use Google Search Console’s URL Inspection to verify what bots actually receive.
  • Paywall detection covers common implementations (Piano, Tinypass, Zephr, Pico, Schema.org markup, NYT/Atlantic/Economist patterns) but bespoke or JS-rendered paywalls may not be detected.
  • Sites with bot protection — some sites detect and block automated fetches, returning a stripped page (no meta description, no structured data, minimal content) to proxy IPs while serving the full page to real browsers. If the scanner reports a near-empty page for a site you know has substantial content, this is likely what’s happening. Use the paste-HTML option instead: open the page in Chrome, hit Cmd/Ctrl + Opt + U for View Source, select all, copy, and paste into the HTML box above. That bypasses the proxy entirely and scans the real HTML the browser sees.
  • How to tell if you hit a bot block: if the scanner shows very few page-metadata signals (no title, no meta description, no JSON-LD) but the site is a well-known brand, the proxy almost certainly got a blocked response rather than the real page. The scanner will show a warning banner when the response looks degraded.

For comprehensive analysis comparing raw HTML against fully rendered output (which catches the middle-band cases), a headless-browser approach is required. The scanner here is designed as a fast triage tool — treat scores as directional indicators of LLM visibility, not absolute measurements.