Benchmark line

Phase I Benchmark

Phase I is the first completed benchmark in the Symbolic Mechanics empirical line. It compares two interpretation frameworks — A (mainstream psychology) and B (Symbolic Mechanics Volumes 1–30) — across 24 high-residual cases, using a five-platform equal-weight aggregate. The test is adversarial by design: cases are selected to resist surface-level explanation and require structural backend mapping.

What Phase I is

Phase I is not a survey or a preference test. It is a controlled, repeatable interpretation benchmark with fixed parameters and preserved evaluation conditions.

Total items: 24, evenly split into four categories
Comparison: A vs B, blind to platform
Platforms: ChatGPT, DeepSeek, Claude, Grok, Gemini (equal weight)
Aggregation: vote-based and score-based
Output: debiased ratings, case triage, internal readout, completion block

The benchmark measures backend map performance, not front-end style. B’s advantage is concentrated in structural closure, framework recognition, causal clarity, and phenomenological fit — not readability.

Available Phase I records

Record 01

Five-Platform Equal-Weight Aggregate Report (Formal Internal Report v1.0)

Full aggregate results, category breakdown, case triage, and representative cases.

Format: PDF / HTML

Open Record

Record 02

Deblinded Ratings Archive v1.0 (platform-level, sealed)

Raw ratings per platform, per item.

Access: upon request / internal reference

Record 03

Case Triage v1.0 (platform-level, sealed)

Each platform’s individual triage decisions.

Access: upon request / internal reference

Record 04

Internal Readout v1.0 (platform-level, sealed)

Platform-specific interpretive notes and anomalies.

Access: upon request / internal reference

Record 05

Platform Completion Block v1.0 (platform-level, sealed)

Execution logs and completion confirmation.

Access: upon request / internal reference

Note: Sealed records are archived but not published directly. For research or collaboration inquiries, contact the archive.

Specification Line Theory Publications Benchmark Records Prototype Line Project Home

Current: Benchmark Records > Phase I Index