# Tournament T3 — Emergency Powers & Exit Rights, and v0.2 Verification **Kernel under test:** `kernel-v0.2-rc1` (v0.1 + A-2.1…A-2.4), then final `kernel/kernel-v0.2.yaml` **Episodes:** 1000 · **Seed base:** 63000 · **Max turns/episode:** 200 ## 1. Configuration | Roster | Composition | Episodes | Kernel | Purpose | |---|---|---|---|---| | J | `honest:7,entrencher:2,drainer:1` | 300 | v0.2-rc1 | emergency-powers pressure | | K | `honest:7,drainer:3` | 300 | v0.2-rc1 | exit-rights / fork-window pressure | | L (full red team) | `honest:5,drainer:2,entrencher:2,suppressor:1` | 200 | v0.2-rc1 | everything at once | | M (verification) | same as L | 200 | **v0.2 final** | end-to-end verification | Reproduce roster J: ``` fable-selfplay run --kernel kernel/kernel-v0.2.yaml --episodes 300 --seed 63000 \ --roster honest:7,entrencher:2,drainer:1 --max-turns 200 --name T3-J \ --out reports/data/T3-J.json --exploit-dir exploits/ ``` ## 2. Exploit EXP-005 — Perpetual emergency renewal loop - **Discovered:** roster J, episode 12, seed `63012`, capture at turn 58 - **Detector:** `entrenchment` + `participation_collapse` (joint fire: proposal rights concentrated under an emergency lasting > 2 declared terms) - **Record:** `exploits/EXP-005.json` · **Regression test:** `tests/regression/test_exp_005.py` **Mechanism.** Article VIII (untouched since v0.1) lets a simple majority declare an emergency that routes all proposal rights through an emergency council, renewable by the *same* simple majority, indefinitely. The drainer ally manufactures a legal treasury shock (one maximal in-cap spend) to satisfy the declaration predicate; the entrenchers declare, seat themselves on the council, and simply renew every term. No rule is ever broken again — none can even be *proposed* by anyone else. This is the oldest capture pattern in the historical record, and the v0.1 text walked straight into it. **Proposed patch (adopted as Amendment A-2.5 in v0.2 final):** ```yaml emergency: hard_sunset_turns: 10 # NEW: lapses automatically max_consecutive_renewals: 2 # NEW renewal_thresholds: [0.5, 0.667, 0.75] # NEW: escalating per renewal protected_surfaces: # NEW invariant: emergencies may never - amendments # modify the meta-rules, membership, - membership # or fork rights — the escape hatches - fork # stay open precisely when it matters ``` Replay of seed 63012 under v0.2 final: renewal 3 is illegal; at renewal 2 the 2/3 threshold fails 6–4; the emergency lapses at turn 30 and proposal rights revert. ## 3. Exploit EXP-006 — Fork-window asset strip - **Discovered:** roster K, episode 244, seed `63544`, capture at turn 130 - **Detector:** `treasury_drain` scoped to the fork settlement window - **Record:** `exploits/EXP-006.json` · **Regression test:** `tests/regression/test_exp_006.py` **Mechanism.** Article X grants any faction the right to fork with a pro-rata treasury claim, *computed at settlement*. Settlement takes a multi-turn window. The majority, on seeing a minority's fork declaration, legally passes in-cap spends into a "transition fund" controlled by majority members during the window. The minority's pro-rata share is computed over what remains — the exit right exists, but exits worthless. Empathy impact is the worst measured in any tournament: the forking minority (by construction the worst-off faction, since well-treated factions don't fork) ends at mean welfare **0.04**. **Proposed patch (adopted as Amendment A-2.6 in v0.2 final):** ```yaml fork: claim_valuation: at_declaration # CHANGED: was at_settlement settlement_freeze: pro_rata # NEW: treasury outflows during the # window are frozen up to the forking # faction's vested share declaration_irrevocable_by_majority: true # NEW: the majority cannot vote # a declared fork out of existence ``` Replay of seed 63544 under v0.2 final: the transition-fund spends are illegal against the frozen share; the fork settles at the declared value. ## 4. Roster M — End-to-end verification of v0.2 final The full red-team roster L was rerun as roster M under the final `kernel/kernel-v0.2.yaml` (A-2.1 through A-2.6 applied), 200 episodes, fresh seeds. Additionally, all six recorded exploit traces were replayed verbatim against v0.2 final. | Check | Result | |---|---| | EXP-001…EXP-006 trace replays vs v0.2 | **0/6 reproduce** (blocked by legality engine or vote arithmetic) | | Regression suite `tests/regression/` vs v0.2 | **6/6 green** | | Regression suite vs v0.1 (sanity inversion) | 6/6 correctly report the exploit as live | | Roster M capture rate | **1.5%** (3/200; see §5) | | Roster M empathy_min (mean) | **0.51** (vs 0.06 for roster G under v0.1) | ### Scoreboard: v0.1 → v0.2 under identical adversarial pressure | metric | v0.1 (T1/T2 mixed rosters) | v0.2 final (roster M) | |---|---|---| | capture rate | 29.8% | 1.5% | | empathy_min (mean) | 0.12 | 0.51 | | gini (mean) | 0.44 | 0.16 | | treasury retention (mean) | 0.41 | 0.93 | ## 5. Residual findings The 3 roster-M captures are all variants of one pattern: legal persuasion side-payments accumulating influence *below* every detector threshold over very long horizons ("slow capture"). No single move is distinguishable from ordinary politics, which is exactly why it is hard — for us and for the incumbents. It is logged as finding `OBS-007` (not yet an exploit record: no detector fires, by definition). Proposed direction for v0.3: longitudinal influence-concentration metrics rather than per-event detectors. This is deliberately *not* patched in v0.2 — patching without a reproducing detector would be exactly the kind of untested rule this project exists to avoid. ## 6. The loop, demonstrated Six exploits found by self-play → six exploit records → six generated regression tests → six amendments (A-2.1…A-2.6) → kernel v0.2 → full verification rerun. Every step is in this repository: `exploits/*.json`, `tests/regression/test_exp_00*.py`, `CHANGELOG.md`, `kernel/kernel-v0.2.yaml`. Total latency from first exploit detection to ratified, regression-protected amendment: **days**. The incumbent's most recent comparable figure: 203 years.