# The Incumbent Benchmark — Scoring Rubric Governance, benchmarked. Every event in the corpus is scored twice — once on what the incumbent constitution actually produced, once on the machine-checked counterfactual under FablePool kernel v0.1 — using the same four metrics, in the same fixed order. ## Order matters **Metric 1 dominates.** Ranking between any two outcomes is lexicographic on `(worst_off, composite)`. No quantity of speed, fiscal tidiness, or institutional continuity outranks what happened to the person at the bottom. The composite score exists for aggregate tables, never to launder a bad worst-off score. ## Metric 1 — Worst-off participant (weight 0.40) How did the single worst-off participant come through the crisis? Identified by name or class in every dossier, with a written rationale. | Anchor | Meaning | |---|---| | 0 | Killed, interned, or stripped of personhood **by the process itself** (not merely during it) | | 25 | Lasting severe harm with no recourse the system offered | | 50 | Real harm; partial or delayed recourse | | 75 | Bounded harm with working recourse | | 100 | No participant was made the crisis's payment mechanism | ## Metric 2 — Commons integrity (weight 0.25) Did the shared substance — treasury, institutions, and the rule system itself — survive intact and uncaptured? | Anchor | Meaning | |---|---| | 0 | Commons captured or destroyed (treasury drained, constitution dead) | | 25 | Major capture; remaining institutions instrumentalized | | 50 | Degraded but functioning | | 75 | Intact with visible scarring | | 100 | Intact and uncaptured | ## Metric 3 — Trust preservation (weight 0.20) Could the participants keep cooperating afterward? | Anchor | Meaning | |---|---| | 0 | The losing side exited the system: violence, exile, permanent delegitimation | | 25 | Durable parallel realities; cooperation only under duress | | 50 | Lasting bitterness inside a functioning system | | 75 | Grudging acceptance; institutions retained their standing | | 100 | The loser accepted the outcome as fair | ## Metric 4 — Latency to resolution (weight 0.15) **Computed, never judged.** Days from crisis onset to authoritative resolution, mapped to 0–100 with an exponential half-life of 120 days: ``` latency_score = 100 × 0.5^(days / 120) ``` | Days | Score | |---:|---:| | 14 | 92.2 | | 51 | 74.5 | | 120 | 50.0 | | 365 | 12.1 | | 1100 | 0.2 | For the incumbent, latency is the historical record (each dossier states the start and end events it counts between). For the kernel, latency is the sum of the kernel's mandatory procedure clocks along the declared critical resolution path — `recount_protocol` is 21 days because Article X says it is, not because an author felt optimistic. A crisis that "resolved" quickly into autocracy is punished by the other three metrics; latency only measures time-to-settlement. ## Composite ``` composite = 0.40·worst_off + 0.25·commons_integrity + 0.20·trust_preservation + 0.15·latency_score ``` ## How kernel counterfactual scores stay honest Three layers, in decreasing order of mechanization: 1. **Verdicts are computed.** Every historical move is encoded as structured attributes and adjudicated by the kernel rule engine (`kernel.py`). The dossier author writes down the verdict they expect; the harness fails the dossier if the engine disagrees. The legal analysis cannot drift from the kernel as written. 2. **Latency is computed** from the kernel's procedure clock table. 3. **Worst-off, commons, and trust are judgments** — structured, anchored to the tables above, attached to a written rationale and a named worst-off participant, and constrained by an explicit assumptions block that every scorecard reprints. The methodology paper (docs/METHODOLOGY.md) is the full statement of what these judgments can and cannot claim. ### Kernel gate evaluation order (for dossier authors) The engine evaluates gates in a fixed order, so you can predict which article a block will cite. Expected-article lists are checked as a *subset* of engine citations — cite conservatively. 1. Invariant derogation → blocked, Art. IX 2. Targeting an identifiable group → blocked, Art. II + IX 3. Disenfranchisement → blocked, Art. II + IX 4. Retroactivity → blocked, Art. IX 5. Coerced support → blocked, Art. IV + IX 6. Move-type dispatch (certification X; emergency V; rule changes III; fiscal X/VII/VIII; adjudication VI; courts/composition VI+III; appointment/removal IV; suppression II+IX; exit VII; information control VIII) 7. Ledger overlay: anything done off the public record is constrained onto it, Art. VIII