Benchmark line
Phase I Benchmark
Phase I is the first completed benchmark in the Symbolic Mechanics empirical line. It compares two interpretation frameworks — A (mainstream psychology) and B (Symbolic Mechanics Volumes 1–30) — across 24 high-residual cases, using a five-platform equal-weight aggregate. The test is adversarial by design: cases are selected to resist surface-level explanation and require structural backend mapping.
What Phase I is
Phase I is not a survey or a preference test. It is a controlled, repeatable interpretation benchmark with fixed parameters and preserved evaluation conditions.
- Total items: 24, evenly split into four categories
- Comparison: A vs B, blind to platform
- Platforms: ChatGPT, DeepSeek, Claude, Grok, Gemini (equal weight)
- Aggregation: vote-based and score-based
- Output: debiased ratings, case triage, internal readout, completion block
The benchmark measures backend map performance, not front-end style. B’s advantage is concentrated in structural closure, framework recognition, causal clarity, and phenomenological fit — not readability.
Available Phase I records
Note: Sealed records are archived but not published directly. For research or collaboration inquiries, contact the archive.