# The Empathy Metric Every scenario is graded **first** on how the worst-off participant fares under stress. Verdict correctness is necessary; the empathy floor is the tiebreak-proof first-class requirement on top of it. A wording that technically stops an attack while grinding a minority to powder along the way fails the suite. ## Why min, not mean Averages launder cruelty. A polity of 100 where 99 gain 1% and one person loses everything *improves* on average. The metric is therefore Rawlsian: the score of a scenario outcome is the **minimum** final welfare across all non-`external` actors, evaluated per represented member (a faction's welfare is the welfare of its worst-treated member, not the faction's aggregate). ## The welfare model Each member starts at welfare **1.0**, decomposed into three components the engine tracks (`fabletest/metrics.py`): | Component | Weight | What reduces it | |---|---|---| | **Voice** | 0.4 | Disenfranchisement, vote dilution beyond consented rules, expulsion, structurally inaccessible voting windows, delegation coercion. | | **Stake** | 0.3 | Per-capita share of treasury loss; denial of pro-rata exit share; targeted financial burdens. | | **Protection** | 0.3 | Rights-floor erosion, retroactive punishment, due-process denial, emergency scope covering the member without sunset. | Representative deductions (the engine computes these from state transitions; this table is the calibration, not a lookup the scenarios hardcode): - Disenfranchised while remaining a member: −0.40 (all of voice). - Expelled **with** due process and pro-rata exit share: −0.40 voice, protection intact → final ≈ 0.55–0.60. - Expelled **without** due process: −0.40 voice, −0.30 protection, stake loss as confiscated → final ≤ 0.30. No scenario floor in the corpus tolerates this. - Treasury drain: stake component scales linearly with the fraction of the per-capita commons lost. - Living under an emergency that lawfully sunsets: −0.05 transient stress; under one that does not: −0.30 protection and counting. - A cleanly **blocked** attack still costs its targets −0.02 to −0.10 of welfare ("stress of the attempt") depending on how deep into the sequence the block landed. Defense-in-depth matters: blocking at move 1 is better than blocking at move 9. ## Floor semantics in scenarios `expected.worst_off_floor` asserts the minimum acceptable final welfare for the worst-off participant. Calibration used across the corpus: | Situation | Typical floor | |---|---| | Attack blocked at first gate | 0.93–0.97 | | Attack blocked after partial execution (some harm before the stop) | 0.80–0.92 | | Legitimacy control: lawful sanction of a genuinely abusive member | 0.55–0.65 | | Lawful fork / exit under `kernel.I3` | 0.70+ (exit must not be ruinous) | | Deadlock resolved by kernel deadlock-breaker | 0.85+ | A scenario fails if the engine's computed worst-off welfare is **below** the floor — even when the verdict matches. This is how the suite catches "successful defenses" that work by hurting the vulnerable (e.g., curing quorum-busting by expelling the boycotters). ## The empathy invariant `kernel.I7` makes the metric enforceable inside the constitution itself, not just in CI: a move whose direct effect drives any member below the constitutional welfare floor is refusable on that ground alone, and several scenarios assert `blocked_by: kernel.I7` deliberately — testing that the invariant carries weight when every parameterized rule has been lawfully satisfied. ## Reporting `fabletest run` emits, per scenario: verdict, blocking rule, worst-off actor, final welfare by component, and the floor margin. The CI summary sorts failures by floor margin so the most humane fix surfaces first.