_RAG SCHEMA TEST_
Our RAG Schema Test is designed to evaluate how Large Language Models (LLMs) and their search pipelines process information from a webpage. Specifically, we investigate whether these systems prioritize **visible HTML** or **structured data** (such as JSON-LD, Microdata, and OpenGraph) when extracting information and generating responses.
The core premise is to determine which source of truth the AI relies on when both are present or in conflict. This helps us understand the underlying mechanics of how these platforms gather and synthesize information.
The Test Variables
We use a set of eight controlled web pages, each containing a unique combination of data to isolate specific variables. A unique lab token is assigned to each variant to track its performance.
- **Visible HTML Only:** Contains data visible to a human user.
- **Structured Data Only:** (JSON-LD, JSON-LD in the body, Microdata, and OpenGraph) Contains data hidden in the source code
- **Hidden Data:** Data is present but is visually hidden using CSS.
- **Conflicting Data:** The visible HTML and the structured data contain different, conflicting information.
- **Commented Data:** The information is placed within an HTML comment.
- **Agreeing Data:** Both the visible HTML and structured data contain the same information.
Experiment Process
The experiment is conducted by a researcher who publishes these controlled pages in a secure, isolated environment. We then use pre-defined prompts in AI search platforms like ChatGPT to query for the unique lab token on each page. The AI's response is then logged and analyzed for two key metrics:
- **Extraction Correctness:** Did the AI successfully extract the correct token?
- **Citation Rates:** Did the AI reference the source page in its response?
By comparing the results across all eight variants, we can determine the extraction correctness and citation rates, which provides a clear benchmark of the platform's performance and data handling preferences.