SearchLabs.uk

_RAG SCHEMA TEST_

Our RAG Schema Test is designed to evaluate how Large Language Models (LLMs) and their search pipelines process information from a webpage. Specifically, we investigate whether these systems prioritize **visible HTML** or **structured data** (such as JSON-LD, Microdata, and OpenGraph) when extracting information and generating responses.

The core premise is to determine which source of truth the AI relies on when both are present or in conflict. This helps us understand the underlying mechanics of how these platforms gather and synthesize information.

The Test Variables

We use a set of eight controlled web pages, each containing a unique combination of data to isolate specific variables. A unique lab token is assigned to each variant to track its performance.

Experiment Process

The experiment is conducted by a researcher who publishes these controlled pages in a secure, isolated environment. We then use pre-defined prompts in AI search platforms like ChatGPT to query for the unique lab token on each page. The AI's response is then logged and analyzed for two key metrics:

By comparing the results across all eight variants, we can determine the extraction correctness and citation rates, which provides a clear benchmark of the platform's performance and data handling preferences.