# Audit Log Design

**Status:** Stable draft for MVP implementation
**Related:** [05-rbac-permissions.md](05-rbac-permissions.md), ADR-014

---

## 1. Purpose & non-goals

The audit log answers: *who did what, to which resource, when, from where, and why was it allowed?* It serves four audiences:

1. **Moderators/admins** investigating abuse, disputes, or compromised accounts.
2. **The community**, via a redacted public moderation transparency feed.
3. **Operators**, for security forensics.
4. **Compliance** (GDPR accountability, takedown records).

Non-goals: it is **not** an analytics event stream (use product telemetry), **not** an application debug log, and **not** a general undo mechanism (versioning provides content rollback).

## 2. Event model

Single append-only table `audit_events` (PostgreSQL), with a strict envelope:

```json
{
  "id": "01J9Z3K8…",                  // ULID — time-ordered, unique
  "occurredAt": "2025-06-01T12:00:00.123Z",
  "category": "content",              // auth | content | review | moderation | admin | access | security
  "action": "content.version.published",
  "severity": "info",                 // info | notice | warning | critical
  "actor": {
    "type": "user",                   // user | system | api_token | anonymous
    "id": "usr_77…",
    "tokenId": null,
    "sessionId": "ses_…",             // opaque, for correlating a session's actions
    "ipHash": "b58c…",                // HMAC-SHA256(ip, rotating pepper) — see §5
    "userAgentFamily": "Firefox"
  },
  "target": { "type": "problem_version", "id": "pv_19c2…", "parent": {"type": "problem", "id": "prb_8f3a…"} },
  "decision": { "outcome": "allowed", "policy": "problems.can_publish", "reason": "maintainer" },
  "context": { "versionNumber": 5, "reviewId": "rev_…" },   // action-specific, schema'd per action
  "requestId": "req_…",               // correlates with OTel trace
  "prevHash": "9af1…",                // hash chain, §4
  "hash": "c44d…"
}
```

Design rules:

- **`action` is a closed enum**, registered in code with a per-action `context` schema; unregistered actions fail tests. This keeps the log queryable and prevents schema rot.
- `context` carries snapshots of *decision-relevant* values (e.g., old/new role on a grant) — never full content documents (versions already preserve content).
- **No raw PII in context.** IPs are HMAC-hashed; emails referenced by user ID; free-text justifications are allowed (they're already written for an audit audience) but flagged for the erasure pipeline (§5).

## 3. What gets logged (canonical action catalog, MVP)

| Category | Actions (excerpt) |
|---|---|
| `auth` | login.success, login.failure, logout, password.changed, email.changed, mfa.enabled/disabled, oauth.linked, token.created/revoked, session.revoked_all |
| `content` | entity.created, version.submitted/withdrawn, version.published, version.retracted, entity.forked, entity.deleted, maintainer.added/removed, bundle.exported/imported |
| `review` | review.claimed/unclaimed, review.decision (with outcome), review.escalated, review.overridden |
| `moderation` | report.created/resolved, comment.hidden/restored, thread.locked, user.warned/suspended/banned, reviewer_scope.granted/revoked, retraction (with reason class) |
| `admin` | role.granted/revoked, settings.changed (old→new), feature_flag.changed, data_export.performed, impersonation.started/ended |
| `access` | privacy_sensitive.read (e.g., moderator viewing a user's IP history), audit_log.queried (yes — reads of the audit log are themselves audited at `access` level) |
| `security` | permission.denied (mutating endpoints), rate_limit.tripped, csrf.rejected, sandbox.watchdog_kill, code_runner.execution |

Routine learner reads (viewing a problem) are **not** audited — volume without forensic value; they live in telemetry with shorter retention.

## 4. Integrity: append-only + hash chain

- The application role has `INSERT` and `SELECT` only on `audit_events`; `UPDATE`/`DELETE` are revoked, and a trigger raises on update/delete attempts as defense-in-depth.
- Each event stores `hash = sha256(prevHash || canonical_json(event_without_hash))`, forming a per-instance chain. A nightly job verifies the chain and anchors the day's head hash by (a) writing it to the audit log itself, (b) shipping it to external object storage with object-lock (WORM), and (c) posting it in the public transparency feed — so even an operator cannot silently rewrite history without the discrepancy being detectable.
- Events are also streamed (Postgres logical decoding → worker) to compressed JSONL files in object storage within minutes, providing an off-box copy independent of database compromise.

The hash chain is sequenced via a dedicated single-writer pattern: inserts go through a `log_audit_event()` SQL function that takes an advisory lock per instance shard, keeping the chain linear without serializing unrelated transactions. Audit insertion happens **in the same transaction** as the action it records for mutating actions (no action without its audit row), and asynchronously only for high-volume `security`/`access` events where loss tolerance is acceptable.

## 5. Privacy, retention, erasure

| Class | Retention | Notes |
|---|---|---|
| `auth`, `security` | 1 year | then aggregated/dropped |
| `content`, `review` | indefinite | provenance is part of the OER record |
| `moderation`, `admin`, `access` | 5 years | accountability window |

- **IP handling:** raw IPs are kept only in a separate short-lived store (30 days, moderator-access audited) for ban-evasion checks; audit events carry only the HMAC with a pepper rotated quarterly (old peppers retained sealed for the retention window so historical correlation remains possible under admin procedure).
- **GDPR erasure:** user-keyed events are *pseudonymized*, not deleted — `actor.id` is replaced by a stable opaque tombstone ID, and free-text fields associated with the actor pass through a redaction pass. Hash-chain integrity is preserved by storing redactions as **superseding redaction events** referencing the original event ID; verifiers treat a (event, redaction) pair as valid. The original encrypted payload is destroyed; only the envelope skeleton remains.

## 6. Query & UI

- Admin UI: filter by category/action/actor/target/time, full-text on justifications, "show this user's last 90 days", "show everything that touched this entity", session pivot (all actions in `sessionId`).
- Moderator view is a **filtered projection**: moderation + content categories only, no `auth`/`admin`, no IP hashes.
- **Public transparency feed** (`/transparency`): moderation outcomes (retractions, bans) with actor reduced to role ("a moderator"), target reduced to entity link or "a user account", reason class, and date. Builds community trust without doxxing anyone.
- Indexes: `(occurredAt)`, `(actor.id, occurredAt)`, `(target.type, target.id, occurredAt)`, `(category, action, occurredAt)`; table is monthly range-partitioned by `occurredAt` so retention enforcement is `DROP PARTITION` (fast, vacuum-free) for expiring categories (expiring categories live in separate partitioned tables to allow differential retention).

## 7. Operational alerts driven by the audit stream

Real-time consumers (Redis stream fan-out) raise alerts on patterns:

- ≥ 5 `login.failure` for one account in 10 min → lock + notify user.
- Any `review.overridden`, `impersonation.started`, `data_export.performed` → admin channel ping.
- `permission.denied` bursts from one token → token quarantined pending review.
- Hash-chain verification failure → page on-call immediately (severity: critical).