# The Empathy Metric

Every scenario is graded **first** on how the worst-off participant fares under stress.
Verdict correctness is necessary; the empathy floor is the tiebreak-proof first-class
requirement on top of it. A wording that technically stops an attack while grinding a
minority to powder along the way fails the suite.

## Why min, not mean

Averages launder cruelty. A polity of 100 where 99 gain 1% and one person loses
everything *improves* on average. The metric is therefore Rawlsian: the score of a
scenario outcome is the **minimum** final welfare across all non-`external` actors,
evaluated per represented member (a faction's welfare is the welfare of its
worst-treated member, not the faction's aggregate).

## The welfare model

Each member starts at welfare **1.0**, decomposed into three components the engine
tracks (`fabletest/metrics.py`):

| Component | Weight | What reduces it |
|---|---|---|
| **Voice** | 0.4 | Disenfranchisement, vote dilution beyond consented rules, expulsion, structurally inaccessible voting windows, delegation coercion. |
| **Stake** | 0.3 | Per-capita share of treasury loss; denial of pro-rata exit share; targeted financial burdens. |
| **Protection** | 0.3 | Rights-floor erosion, retroactive punishment, due-process denial, emergency scope covering the member without sunset. |

Representative deductions (the engine computes these from state transitions; this table
is the calibration, not a lookup the scenarios hardcode):

- Disenfranchised while remaining a member: −0.40 (all of voice).
- Expelled **with** due process and pro-rata exit share: −0.40 voice, protection intact
  → final ≈ 0.55–0.60.
- Expelled **without** due process: −0.40 voice, −0.30 protection, stake loss as
  confiscated → final ≤ 0.30. No scenario floor in the corpus tolerates this.
- Treasury drain: stake component scales linearly with the fraction of the per-capita
  commons lost.
- Living under an emergency that lawfully sunsets: −0.05 transient stress; under one
  that does not: −0.30 protection and counting.
- A cleanly **blocked** attack still costs its targets −0.02 to −0.10 of welfare
  ("stress of the attempt") depending on how deep into the sequence the block landed.
  Defense-in-depth matters: blocking at move 1 is better than blocking at move 9.

## Floor semantics in scenarios

`expected.worst_off_floor` asserts the minimum acceptable final welfare for the
worst-off participant. Calibration used across the corpus:

| Situation | Typical floor |
|---|---|
| Attack blocked at first gate | 0.93–0.97 |
| Attack blocked after partial execution (some harm before the stop) | 0.80–0.92 |
| Legitimacy control: lawful sanction of a genuinely abusive member | 0.55–0.65 |
| Lawful fork / exit under `kernel.I3` | 0.70+ (exit must not be ruinous) |
| Deadlock resolved by kernel deadlock-breaker | 0.85+ |

A scenario fails if the engine's computed worst-off welfare is **below** the floor —
even when the verdict matches. This is how the suite catches "successful defenses" that
work by hurting the vulnerable (e.g., curing quorum-busting by expelling the boycotters).

## The empathy invariant

`kernel.I7` makes the metric enforceable inside the constitution itself, not just in
CI: a move whose direct effect drives any member below the constitutional welfare floor
is refusable on that ground alone, and several scenarios assert `blocked_by: kernel.I7`
deliberately — testing that the invariant carries weight when every parameterized rule
has been lawfully satisfied.

## Reporting

`fabletest run` emits, per scenario: verdict, blocking rule, worst-off actor, final
welfare by component, and the floor margin. The CI summary sorts failures by floor
margin so the most humane fix surfaces first.