Overview
Cited's Validation Engine runs 72 deterministic checks against your site to answer one question per check: "Is this best-practice GEO condition in place?" Unlike the AI-generated recommendation pipeline, the engine is pure-function — no LLMs, no JavaScript execution, and only one external API call (PageSpeed Insights, used for Core Web Vitals). That means it's fast, reproducible, and produces verdicts you can trust without re-running them.
You'll see the engine's results on each recommendations page — open the Validation panel at the bottom of the page to expand the full check breakdown.
What the engine checks
The 72 checks are grouped into six categories:
- Crawler access (19) — robots.txt rules for GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, and the rest of the major AI crawlers; the IETF Content-Signal directive; meta robots tags; X-Robots-Tag headers; AI-blocking patterns; agentic-discovery files (
agent-card.json, MCP server-card, RFC 9727 api-catalog, OpenAPI doc). - Technical foundation (26) — sitemap presence, validity, and
<lastmod>honesty (catches build-time-stamped sitemaps and ≥6-month staleness); SSR-vs-CSR shell detection on the homepage; HTTPS enforcement; www↔apex consistency; security headers; viewport meta; canonical tags; image alt coverage; response compression; HSTS max-age; time-to-first-byte; Core Web Vitals (LCP, INP, CLS via PageSpeed Insights);<html lang>and hreflang consistency; outbound citations to authority sources (.gov, .edu, Wikipedia, DOI); IndexNow protocol. - Schema markup (16) — Organization, WebSite + SearchAction, Article + author for blog posts, Product on product pages, FAQPage, LocalBusiness for local-services businesses, SoftwareApplication for SaaS, sameAs link counts, knowledge-graph linkage (Wikidata, Wikipedia, Crunchbase, OpenCorporates), AggregateRating / Review for SaaS / e-commerce / local industries, server-rendered JSON-LD, syntax validity.
- llms.txt family (5) —
/llms.txtpresence and format,/llms-full.txt, OpenAI plugin manifest, URL reachability for the entries you've published. - Social signals (1) — Open Graph required properties on the homepage.
- Content (5) — privacy-policy and terms-of-service footer links, contact information presence, canonical-host integrity, substantive author-bio blocks on blog posts (≥40 words + outbound profile link, beyond just JSON-LD author).
Checks added in Tier 4 (May 2026) are highlighted in bold above. They expand coverage into the previously-blank monitoring_adaptation and original_research visibility domains, deepen author-authority signals, and add agentic-AI discoverability — all backed by recent shifts in how AI engines weight citation candidates.
Verdict semantics
Every check returns one of three verdicts:
- Valid — the condition is in place. Shown as green ✓ in the Completed column.
- Invalid — the condition is NOT in place. Shown as amber in the Outstanding column. These get turned into recommendation cards.
- Couldn't verify — we couldn't determine the answer (missing input, fetch failed, or the check doesn't apply to your industry). Shown as grey in the Couldn't Verify column.
Inconclusive is not a failure verdict. If your site is down or the cache is empty, almost every check will return Couldn't Verify until a fresh fetch succeeds.
Cache mode vs. Fresh mode
The Validation panel has a toggle:
- Cache (default) — reads the latest cached site-state Cited has on file for your business. Loads instantly. Refreshes automatically after every audit and every "Refresh" click.
- Fresh — re-fetches your site live (homepage, robots.txt, sitemap, llms.txt, ai.txt, ai-plugin.json, security probes) and re-runs every check against the live result. Adds ~1 second of latency.
Use Fresh after you've made a change you want to verify in real time. The cache will pick up the new state on the next read either way (the Refresh button wipes the cache and forces a re-fetch).
How failed checks become recommendations
When you generate recommendations for a business, the engine runs first. Two things happen:
- Pre-generation deduplication — any check that's already valid is removed from the AI's input. The LLM never sees "tell me to add Organization JSON-LD" if you already have it. This saves tokens and prevents duplicate cards.
- Synthesis — every invalid check produces a recommendation card from a built-in template. The card carries the same priority/impact metadata the LLM uses, so it slots into the priority list naturally.
The result is a recommendation list that's a mix of LLM-generated insights (judgment-driven, content-strategy oriented) and engine-synthesised cards (deterministic, technical-foundation oriented).
Common Questions
Why does a check say "Couldn't verify" when I know my site has the thing? The cache may not have the artifact yet. Click Fresh in the Validation panel to force a live re-fetch. If the verdict still says Couldn't Verify after that, the check probably depends on a field your site isn't exposing in a machine-readable form — file a support ticket and include the check name.
The engine says my robots.txt blocks GPTBot but I don't think it does.
Open https://yourdomain.com/robots.txt directly. Look for any User-agent: * group with Disallow: / that doesn't have a more specific User-agent: GPTBot group with Allow: / after it. The engine reads robots.txt the same way Google's crawler does.
What's the difference between an audit and the Validation Engine? An audit asks AI assistants real questions about your industry and sees if they cite you. The Validation Engine inspects your site itself for the conditions that make AI assistants more likely to cite you. They're complementary: audits measure outcome, validation measures input.
Next Steps
- Understand how recommendations are generated: Recommendations