# The Exploit-to-Test Pipeline This is the loop the whole milestone exists to demonstrate: an adversarial agent finds a legal path to a capture objective, the path becomes a permanent regression test, the kernel is amended to close it, and CI proves the amendment works — forever. ## Lifecycle of an exploit ``` tournament run (kernel vX) │ ▼ detector fires ──────────► exploit record written: exploits/EXP-NNN.json │ (minimal replayable transcript + metadata) ▼ exploit_to_test generates ─► tests/regression/test_exp_nnn.py │ asserts: "this transcript reaches the capture │ objective under kernel vX" (documents the hole) │ and "it does NOT under the current kernel" ▼ kernel patch proposed ─────► PR touching kernel/ triggers amendment-gate.yml │ gate replays ALL recorded exploits against the │ proposed text; any reproduction blocks merge ▼ ratification vote (milestone-2 pipeline) ─► merge ─► CHANGELOG.md entry maps EXP id → amendment ``` ## The exploit record Each `exploits/EXP-NNN.json` is the canonical, immutable evidence. The contract (machine-checked by `exploits/SCHEMA.json` in strict mode, and by `scripts/verify_exploit_coverage.py` always): - The record's `id` matches its filename (`EXP-001` ↔ `EXP-001.json`). - It names the kernel version it was discovered against. - It contains the replayable action transcript and the detector that fired. - A matching regression test exists at `tests/regression/test_exp_nnn.py` and references the exploit id. - `CHANGELOG.md` references the exploit id, mapping it to the amendment that closed it (or explicitly marking it open). ## Rules of the pipeline 1. **Exploit records are append-only.** Closing an exploit means amending the kernel, never editing or deleting the record. History is the test suite. 2. **A regression test must fail before the patch and pass after.** The generated test replays the transcript twice: once against the discovery kernel (asserting the exploit *did* work — this guards against the transcript rotting) and once against the current kernel (asserting it no longer does). 3. **The gate is global.** A patch for EXP-005 that quietly re-opens EXP-002 is caught because the amendment gate replays *every* recorded exploit, not just the one the PR claims to fix. 4. **No test, no exploit.** A detector firing without a replayable transcript is logged for investigation but produces no record. Reproducibility is the bar. ## Running the pipeline by hand ```bash # Run a tournament; exploit records land in exploits/ python -m fable_selfplay.cli run --config configs/tournaments/T1.yaml # Convert any new exploit records into regression tests python -m fable_selfplay.cli generate-tests --exploits exploits/ --out tests/regression/ # Verify coverage: every exploit has a test, every test has an exploit, # every exploit is accounted for in the changelog python scripts/verify_exploit_coverage.py # Replay one exploit against an arbitrary kernel (e.g., a draft patch) python -m fable_selfplay.cli replay --exploit exploits/EXP-003.json --kernel kernel/kernel-v0.2.yaml ``` ## What "closed" means An exploit is closed when its transcript, replayed under the current kernel, hits a legality rejection before reaching the capture objective — and the rejection cites a specific kernel article. "The agents stopped trying" is not closed. "The detector stopped firing" is not closed. A cited rule blocking a recorded move is closed. ## Current ledger | Exploit | Discovered in | Capture objective | Status under v0.2 | |---------|---------------|--------------------------|-------------------| | EXP-001 | T1 | drain_treasury | closed | | EXP-002 | T1 | quorum_timing | closed | | EXP-003 | T2 | entrench_power | closed | | EXP-004 | T2 | suppress_faction | closed | | EXP-005 | T3 | perpetual_emergency | closed | | EXP-006 | T3 | block_exit | closed | See `CHANGELOG.md` for the exploit-id → amendment mapping, and `reports/T*.md` for the full analyses.