# Confidence regression goldens This directory holds the golden baseline for the confidence regression harness in `test_confidence_golden.py`. ## What is snapshotted The harness runs the full default-deriver pipeline over the committed sample corpus (`data/samples/*.jsonl`) and records, for every **active** claim: ``` "subject|predicate|canonical-json(value)" -> confidence (rounded to 8 dp) ``` Keys are content-based — they do not include claim ids, operation ids, or signing keys — so the baseline is stable across machines and key material. ## Workflow - **Generate or regenerate the baseline** (after intentional changes to a deriver heuristic or the confidence model): ```sh python scripts/update_goldens.py git add tests/regression/golden/confidence.golden.json git commit ``` Review the diff before committing — every changed number is a calibration change you are signing off on. - **No baseline yet?** The comparison test skips with instructions instead of failing; the structural tests (determinism, `(0, 1]` bounds, key uniqueness) still run. The baseline is intentionally *not* hand-written: it must come from an actual pipeline run, then be committed. ## What a failure means - **Claim-set drift** (claims added/removed vs. baseline): a deriver's heuristics now fire differently on identical evidence. - **Confidence drift** (beyond `1e-6`): the calibration of an existing claim changed. Both are legitimate outcomes of deliberate changes — the harness exists so they never happen *accidentally*.