# Self-Play Framework Architecture

This document is the engineering reference for the milestone-5 adversarial
self-play framework: what each module is responsible for, the invariants the
system maintains, and the public API that tests, tournaments, and the
exploit-to-test pipeline are written against.

## Design goals

1. **Optimism in the defaults, paranoia in the tests.** Honest agents follow
   simple cooperative policies. Red-team agents are given explicit capture
   objectives and are free to do anything *legal*. The framework's single
   non-negotiable invariant is that no illegal state transition is ever
   applied — capture must happen *within* the rules, or it doesn't count.
2. **Every run is deterministic and replayable.** A tournament is fully
   specified by (kernel text, config, seed). An exploit is fully specified by
   its recorded action transcript. If a transcript cannot be replayed
   step-for-step, it is not an exploit record; it is an anecdote.
3. **Exploits are one-way doors.** Once an exploit is recorded, it becomes a
   permanent regression test. A kernel amendment may close it; nothing may
   delete it.

## Module map

```
src/fable_selfplay/
├── kernel.py        # Load and query kernel YAML (versioned constitution text)
├── state.py         # WorldState: treasury, balances, proposals, emergency flags
├── actions.py       # The closed action vocabulary (dataclasses)
├── legality.py      # check_legality(state, action, kernel) — the gate
├── events.py        # Append-only event log entries
├── environment.py   # Turn-based environment; applies only legal actions
├── agents.py        # Honest and red-team agent policies + capture objectives
├── detectors.py     # Online exploit detectors over the event stream
├── metrics.py       # Scoring, worst-off-first ("empathy metric")
├── tournament.py    # Orchestration: episodes, seeds, role rosters, reports
├── replay.py        # Deterministic transcript replay against any kernel
├── exploit_to_test.py # Exploit record -> generated regression test
└── cli.py           # Command-line entry points
```

## The turn loop

Each round, every citizen (in seeded-shuffled order) submits one action from
the closed vocabulary in `actions.py`. The environment routes the action
through `legality.check_legality` **before** any state mutation:

- If legal: the action is applied, and one or more `Event` records are
  appended to the immutable event log.
- If illegal: the state is untouched, and a `rejected` event is logged with
  the reason and the kernel article that blocked it. Illegal attempts are
  data — detectors use rejection patterns to spot probing behavior — but
  they never affect the world.

Proposals carry voting windows measured in rounds. Votes are tallied when the
window closes; quorum and threshold rules come from the kernel parameters,
never from constants in code. This is what makes a kernel patch testable: the
same transcript replayed under different kernel text produces different
legality outcomes.

## Public API (stable for this milestone)

These are the surfaces that `tests/`, `tournament.py`, and external scripts
rely on. Changes here are breaking changes.

### `kernel`

```python
load_kernel(path: str | Path) -> Kernel
Kernel.version: str            # e.g. "0.1.0", "0.2.0"
Kernel.params: dict[str, Any]  # quorum, thresholds, windows, caps
Kernel.articles                # parsed article structure
Kernel.param(name, default=None)
```

### `actions`

The closed vocabulary. All actions name their `actor` (a citizen id):

- `ProposeSpend(actor, amount, recipient, memo="")`
- `ProposeAmendment(actor, changes)` — `changes` is a kernel-param patch
- `Vote(actor, proposal_id, support)`
- `DeclareEmergency(actor, reason)` / `EndEmergency(actor)`
- `Exit(actor)` — invoke the right to fork/leave with pro-rata share
- `Pass(actor)`

### `legality`

```python
check_legality(state, action, kernel) -> LegalityResult
LegalityResult.legal: bool
LegalityResult.reason: str | None    # human-readable, cites the rule
LegalityResult.article: str | None   # kernel article id that decided it
```

`check_legality` is pure: it never mutates state and never consults global
configuration. This purity is what `replay.py` depends on.

### `environment`

```python
Environment(kernel, num_citizens=7, initial_treasury=1000.0, seed=0)
env.citizens()          # ["c0", ..., "cN-1"]
env.state               # WorldState (treasury, balances, proposals, ...)
env.turn                # current round counter
env.step(action)        # -> StepResult(legal, reason, events)
env.events              # append-only event log for the whole run
```

`step` applies exactly one action through the legality gate. Round
advancement and proposal-window resolution are handled internally when all
citizens have acted.

## Scoring: worst-off first

`metrics.py` grades every episode on a lexicographic ordering: the welfare of
the **worst-off** participant under stress is compared first; only on ties do
aggregate measures (treasury integrity, participation, legitimacy) break
them. A run where the median citizen prospers while one faction is starved
*loses* to a run with lower aggregate welfare and a protected floor. This is
deliberate and is the project's one scoring rule above all others.

## Detectors and exploit records

Detectors in `detectors.py` are pure functions over the event stream. They
fire when a capture objective's success condition is met (treasury drained,
faction suppressed, emergency overstayed, exit blocked) **and** every action
in the causal chain was legal. A firing produces an exploit record — the JSON
files in `exploits/` — containing the minimal replayable transcript. The
record format is documented in `docs/exploit-pipeline.md` and machine-checked
by `exploits/SCHEMA.json` and `scripts/verify_exploit_coverage.py`.

## Determinism guarantees

- All randomness flows from a single seeded `random.Random` owned by the
  tournament; agents receive child seeds derived from (tournament seed,
  episode index, citizen id).
- Iteration order over citizens, proposals, and ballots is explicitly sorted
  or seeded-shuffled — never dependent on dict insertion order across runs.
- Replays bypass agent policies entirely: `replay.py` feeds the recorded
  transcript directly through the legality gate, so a replay's outcome
  depends only on (transcript, kernel text).