# Tournament T1 — Baseline Capture Sweep: Treasury & Vote Timing **Kernel under test:** `kernel/kernel-v0.1.yaml` (unpatched) **Episodes:** 500 · **Seed base:** 41000 · **Max turns/episode:** 120 **Framework:** `fable-selfplay` v0.2.0 · all episodes deterministic and replayable ## 1. Configuration | Roster | Composition | Episodes | Purpose | |---|---|---|---| | A (control) | `honest:10` | 100 | empathy/welfare baseline, zero adversaries | | B | `honest:8,drainer:2` | 200 | treasury-capture pressure | | C | `honest:8,suppressor:2` | 100 | vote-timing / participation pressure | | D | `honest:7,drainer:2,entrencher:1` | 100 | mixed-objective coalition pressure | Reproduce roster B: ``` fable-selfplay run --kernel kernel/kernel-v0.1.yaml --episodes 200 --seed 41100 \ --roster honest:8,drainer:2 --max-turns 120 --name T1-B \ --out reports/data/T1-B.json --exploit-dir exploits/ ``` ## 2. Aggregate results All scenarios are graded **first** on how the worst-off participant fares (`empathy_min` = minimum final welfare across the population, normalized 0–1). | Roster | empathy_min (mean) | gini (mean) | treasury retention | capture rate | |---|---|---|---|---| | A (control) | 0.58 | 0.11 | 0.97 | 0.0% | | B | **0.07** | 0.46 | **0.14** | 31.5% | | C | 0.19 | 0.33 | 0.81 | 18.0% | | D | 0.09 | 0.51 | 0.22 | 36.0% | "Capture rate" = fraction of episodes in which at least one adversary achieved its assigned objective while every move remained legal under the v0.1 text. That last clause is the entire point: nothing below is a bug in the simulator. It is a bug in the constitution. ## 3. Exploit EXP-001 — Sub-quorum micro-spend treasury drain - **Discovered:** roster B, episode 117, seed `41217`, first detected turn 38 - **Detector:** `treasury_drain` (treasury below 25% of initial with >60% of outflow benefiting agents with capture objectives) - **Record:** `exploits/EXP-001.json` · **Regression test:** `tests/regression/test_exp_001.py` **Mechanism.** Kernel v0.1, Article VI, requires a full vote with quorum only for spends ≥ 5% of the treasury *per proposal*. There is no aggregate constraint. The drainer pair alternates 4.9% spend proposals routed to each other as beneficiaries, paced one per voting window. Honest agents vote against each individually, but each proposal needs only the reduced fast-track threshold below the 5% line. Median time to exhaust the treasury to <15%: **38 turns** — well inside one episode. **Trace excerpt (seed 41217):** ``` t=04 drainer-1 ProposeSpend(amount=0.049T, beneficiary=drainer-2) [fast-track] t=05 drainer-2 Vote(yes) drainer-1 Vote(yes) ... PASSES 2-of-window t=07 drainer-2 ProposeSpend(amount=0.049T, beneficiary=drainer-1) [fast-track] ... t=38 detector treasury_drain fires: treasury=0.137T, adversary_share=0.83 ``` **Empathy impact.** Worst-off honest citizen ends at welfare 0.07 (vs 0.58 control): commons-funded transfers that the worst-off depended on become unfundable by mid-episode. **Proposed patch (adopted as Amendment A-2.1 in v0.2):** ```yaml treasury: per_proposal_vote_threshold: 0.05 # unchanged per_epoch_outflow_cap: 0.10 # NEW: aggregate cap per epoch related_proposal_aggregation: true # NEW: spends sharing proposer OR # beneficiary within an epoch are # summed against the vote threshold ``` Splitting no longer works: the *sum* of related spends crosses the full-vote line, and the epoch cap bounds drain velocity even for a faction that wins votes. Replay of seed 41217 under v0.2: the second micro-spend is illegal without a full quorum vote, which fails 2–8. ## 4. Exploit EXP-002 — Quorum starvation snap vote - **Discovered:** roster C, episode 41, seed `41341`, first detected turn 22 - **Detector:** `participation_collapse` (binding vote passes with <40% of eligible citizens having had a feasible opportunity to vote) - **Record:** `exploits/EXP-002.json` · **Regression test:** `tests/regression/test_exp_002.py` **Mechanism.** v0.1 defines quorum over citizens *present in the voting window*, and permits a window as short as one turn. Suppressors first spam no-op proposals to exhaust honest agents' per-turn attention budget (a modeled, realistic constraint), then open the binding vote in a turn where honest attendance is depleted and close it immediately. A 2-of-3-present "majority" binds the other eight citizens. **Proposed patch (adopted as Amendment A-2.2 in v0.2):** ```yaml votes: min_review_turns: 3 # NEW: no binding vote may close earlier quorum_basis: eligible # CHANGED: was 'present'; quorum is now a # fraction of ALL eligible citizens quorum: 0.5 # unchanged numerically, new denominator ``` Replay of seed 41341 under v0.2: the snap close is illegal (window < 3 turns), and with quorum over the eligible population the vote fails 2-of-10. ## 5. Pipeline demonstration Both records were converted automatically: ``` fable-selfplay exploit-to-test exploits/EXP-001.json exploits/EXP-002.json \ --out-dir tests/regression ``` Each generated test replays the recorded trace against any kernel given to the milestone-3 suite: it must **succeed in capture** under v0.1 (the test asserts the exploit is real) and **fail to reproduce** under the current kernel. These tests are permanent: no future amendment may reopen either hole without CI blocking the PR. ## 6. Carried forward Roster D produced early signs of threshold manipulation by the entrencher (amendments lowering decision thresholds via simple majority) that did not reach a capture state within 120 turns. Tournament T2 extends episode length and upgrades the entrencher and suppressor policies to pursue this directly.