# Confidence regression goldens

This directory holds the golden baseline for the confidence regression
harness in `test_confidence_golden.py`.

## What is snapshotted

The harness runs the full default-deriver pipeline over the committed
sample corpus (`data/samples/*.jsonl`) and records, for every **active**
claim:

```
"subject|predicate|canonical-json(value)"  ->  confidence (rounded to 8 dp)
```

Keys are content-based — they do not include claim ids, operation ids, or
signing keys — so the baseline is stable across machines and key material.

## Workflow

- **Generate or regenerate the baseline** (after intentional changes to a
  deriver heuristic or the confidence model):

  ```sh
  python scripts/update_goldens.py
  git add tests/regression/golden/confidence.golden.json
  git commit
  ```

  Review the diff before committing — every changed number is a calibration
  change you are signing off on.

- **No baseline yet?** The comparison test skips with instructions instead
  of failing; the structural tests (determinism, `(0, 1]` bounds, key
  uniqueness) still run. The baseline is intentionally *not* hand-written:
  it must come from an actual pipeline run, then be committed.

## What a failure means

- **Claim-set drift** (claims added/removed vs. baseline): a deriver's
  heuristics now fire differently on identical evidence.
- **Confidence drift** (beyond `1e-6`): the calibration of an existing
  claim changed.

Both are legitimate outcomes of deliberate changes — the harness exists so
they never happen *accidentally*.