Validation Engine

7 min read771 words

Overview

Cited's Validation Engine runs 62 deterministic checks against your site to answer one question per check: "Is this best-practice GEO condition in place?" Unlike the AI-generated recommendation pipeline, the engine is pure-function — no LLMs, no third-party APIs, no JavaScript execution. That means it's fast, reproducible, and produces verdicts you can trust without re-running them.

You'll see the engine's results in two places: the Validation panel on each recommendations page, and the Action Plan when you mark a recommendation-sourced action complete.

What the engine checks

The 62 checks are grouped into six categories:

Crawler access (17) — robots.txt rules for GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, and the rest of the major AI crawlers; meta robots tags; X-Robots-Tag headers; AI-blocking patterns.
Technical foundation (21) — sitemap presence and validity, HTTPS enforcement, www↔apex consistency, security headers, viewport meta, canonical tags, image alt coverage, response compression, HSTS max-age, time-to-first-byte, and IndexNow protocol.
Schema markup (14) — Organization, WebSite + SearchAction, Article + author for blog posts, Product on product pages, FAQPage, LocalBusiness for local-services businesses, SoftwareApplication for SaaS, sameAs link counts, server-rendered JSON-LD, syntax validity.
llms.txt family (5) — /llms.txt presence and format, /llms-full.txt, OpenAI plugin manifest, URL reachability for the entries you've published.
Social signals (1) — Open Graph required properties on the homepage.
Content (4) — privacy-policy and terms-of-service footer links, contact information presence, canonical-host integrity.

The engine deliberately doesn't cover things that need third-party APIs (Wikipedia, LinkedIn, Bing WMT), Core Web Vitals field data (PageSpeed Insights), or subjective grading (E-E-A-T, readability). Those belong to the AI recommendation pipeline.

Verdict semantics

Every check returns one of three verdicts:

Valid — the condition is in place. Shown as green ✓ in the Completed column.
Invalid — the condition is NOT in place. Shown as amber in the Outstanding column. These get turned into recommendation cards.
Couldn't verify — we couldn't determine the answer (missing input, fetch failed, or the check doesn't apply to your industry). Shown as grey in the Couldn't Verify column.

Inconclusive is not a failure verdict. If your site is down or the cache is empty, almost every check will return Couldn't Verify until a fresh fetch succeeds.

Cache mode vs. Fresh mode

The Validation panel has a toggle:

Cache (default) — reads the latest cached site-state Cited has on file for your business. Loads instantly. Refreshes automatically after every audit and every "Refresh" click.
Fresh — re-fetches your site live (homepage, robots.txt, sitemap, llms.txt, ai.txt, ai-plugin.json, security probes) and re-runs every check against the live result. Adds ~1 second of latency.

Use Fresh after you've made a change you want to verify in real time. The cache will pick up the new state on the next read either way (the Refresh button wipes the cache and forces a re-fetch).

How failed checks become recommendations

When you generate recommendations for a business, the engine runs first. Two things happen:

Pre-generation deduplication — any check that's already valid is removed from the AI's input. The LLM never sees "tell me to add Organization JSON-LD" if you already have it. This saves tokens and prevents duplicate cards.
Synthesis — every invalid check produces a recommendation card from a built-in template. The card carries the same priority/impact metadata the LLM uses, so it slots into the priority list naturally.

The result is a recommendation list that's a mix of LLM-generated insights (judgment-driven, content-strategy oriented) and engine-synthesised cards (deterministic, technical-foundation oriented).

Common Questions

Why does a check say "Couldn't verify" when I know my site has the thing? The cache may not have the artifact yet. Click Fresh in the Validation panel to force a live re-fetch. If the verdict still says Couldn't Verify after that, the check probably depends on a field your site isn't exposing in a machine-readable form — file a support ticket and include the check name.

The engine says my robots.txt blocks GPTBot but I don't think it does. Open https://yourdomain.com/robots.txt directly. Look for any User-agent: * group with Disallow: / that doesn't have a more specific User-agent: GPTBot group with Allow: / after it. The engine reads robots.txt the same way Google's crawler does.

What's the difference between an audit and the Validation Engine? An audit asks AI assistants real questions about your industry and sees if they cite you. The Validation Engine inspects your site itself for the conditions that make AI assistants more likely to cite you. They're complementary: audits measure outcome, validation measures input.

Next Steps

Understand how recommendations are generated: Recommendations

Team Collaboration

Claude Desktop (MCP) Setup