# Audit Log Design **Status:** Stable draft for MVP implementation **Related:** [05-rbac-permissions.md](05-rbac-permissions.md), ADR-014 --- ## 1. Purpose & non-goals The audit log answers: *who did what, to which resource, when, from where, and why was it allowed?* It serves four audiences: 1. **Moderators/admins** investigating abuse, disputes, or compromised accounts. 2. **The community**, via a redacted public moderation transparency feed. 3. **Operators**, for security forensics. 4. **Compliance** (GDPR accountability, takedown records). Non-goals: it is **not** an analytics event stream (use product telemetry), **not** an application debug log, and **not** a general undo mechanism (versioning provides content rollback). ## 2. Event model Single append-only table `audit_events` (PostgreSQL), with a strict envelope: ```json { "id": "01J9Z3K8…", // ULID — time-ordered, unique "occurredAt": "2025-06-01T12:00:00.123Z", "category": "content", // auth | content | review | moderation | admin | access | security "action": "content.version.published", "severity": "info", // info | notice | warning | critical "actor": { "type": "user", // user | system | api_token | anonymous "id": "usr_77…", "tokenId": null, "sessionId": "ses_…", // opaque, for correlating a session's actions "ipHash": "b58c…", // HMAC-SHA256(ip, rotating pepper) — see §5 "userAgentFamily": "Firefox" }, "target": { "type": "problem_version", "id": "pv_19c2…", "parent": {"type": "problem", "id": "prb_8f3a…"} }, "decision": { "outcome": "allowed", "policy": "problems.can_publish", "reason": "maintainer" }, "context": { "versionNumber": 5, "reviewId": "rev_…" }, // action-specific, schema'd per action "requestId": "req_…", // correlates with OTel trace "prevHash": "9af1…", // hash chain, §4 "hash": "c44d…" } ``` Design rules: - **`action` is a closed enum**, registered in code with a per-action `context` schema; unregistered actions fail tests. This keeps the log queryable and prevents schema rot. - `context` carries snapshots of *decision-relevant* values (e.g., old/new role on a grant) — never full content documents (versions already preserve content). - **No raw PII in context.** IPs are HMAC-hashed; emails referenced by user ID; free-text justifications are allowed (they're already written for an audit audience) but flagged for the erasure pipeline (§5). ## 3. What gets logged (canonical action catalog, MVP) | Category | Actions (excerpt) | |---|---| | `auth` | login.success, login.failure, logout, password.changed, email.changed, mfa.enabled/disabled, oauth.linked, token.created/revoked, session.revoked_all | | `content` | entity.created, version.submitted/withdrawn, version.published, version.retracted, entity.forked, entity.deleted, maintainer.added/removed, bundle.exported/imported | | `review` | review.claimed/unclaimed, review.decision (with outcome), review.escalated, review.overridden | | `moderation` | report.created/resolved, comment.hidden/restored, thread.locked, user.warned/suspended/banned, reviewer_scope.granted/revoked, retraction (with reason class) | | `admin` | role.granted/revoked, settings.changed (old→new), feature_flag.changed, data_export.performed, impersonation.started/ended | | `access` | privacy_sensitive.read (e.g., moderator viewing a user's IP history), audit_log.queried (yes — reads of the audit log are themselves audited at `access` level) | | `security` | permission.denied (mutating endpoints), rate_limit.tripped, csrf.rejected, sandbox.watchdog_kill, code_runner.execution | Routine learner reads (viewing a problem) are **not** audited — volume without forensic value; they live in telemetry with shorter retention. ## 4. Integrity: append-only + hash chain - The application role has `INSERT` and `SELECT` only on `audit_events`; `UPDATE`/`DELETE` are revoked, and a trigger raises on update/delete attempts as defense-in-depth. - Each event stores `hash = sha256(prevHash || canonical_json(event_without_hash))`, forming a per-instance chain. A nightly job verifies the chain and anchors the day's head hash by (a) writing it to the audit log itself, (b) shipping it to external object storage with object-lock (WORM), and (c) posting it in the public transparency feed — so even an operator cannot silently rewrite history without the discrepancy being detectable. - Events are also streamed (Postgres logical decoding → worker) to compressed JSONL files in object storage within minutes, providing an off-box copy independent of database compromise. The hash chain is sequenced via a dedicated single-writer pattern: inserts go through a `log_audit_event()` SQL function that takes an advisory lock per instance shard, keeping the chain linear without serializing unrelated transactions. Audit insertion happens **in the same transaction** as the action it records for mutating actions (no action without its audit row), and asynchronously only for high-volume `security`/`access` events where loss tolerance is acceptable. ## 5. Privacy, retention, erasure | Class | Retention | Notes | |---|---|---| | `auth`, `security` | 1 year | then aggregated/dropped | | `content`, `review` | indefinite | provenance is part of the OER record | | `moderation`, `admin`, `access` | 5 years | accountability window | - **IP handling:** raw IPs are kept only in a separate short-lived store (30 days, moderator-access audited) for ban-evasion checks; audit events carry only the HMAC with a pepper rotated quarterly (old peppers retained sealed for the retention window so historical correlation remains possible under admin procedure). - **GDPR erasure:** user-keyed events are *pseudonymized*, not deleted — `actor.id` is replaced by a stable opaque tombstone ID, and free-text fields associated with the actor pass through a redaction pass. Hash-chain integrity is preserved by storing redactions as **superseding redaction events** referencing the original event ID; verifiers treat a (event, redaction) pair as valid. The original encrypted payload is destroyed; only the envelope skeleton remains. ## 6. Query & UI - Admin UI: filter by category/action/actor/target/time, full-text on justifications, "show this user's last 90 days", "show everything that touched this entity", session pivot (all actions in `sessionId`). - Moderator view is a **filtered projection**: moderation + content categories only, no `auth`/`admin`, no IP hashes. - **Public transparency feed** (`/transparency`): moderation outcomes (retractions, bans) with actor reduced to role ("a moderator"), target reduced to entity link or "a user account", reason class, and date. Builds community trust without doxxing anyone. - Indexes: `(occurredAt)`, `(actor.id, occurredAt)`, `(target.type, target.id, occurredAt)`, `(category, action, occurredAt)`; table is monthly range-partitioned by `occurredAt` so retention enforcement is `DROP PARTITION` (fast, vacuum-free) for expiring categories (expiring categories live in separate partitioned tables to allow differential retention). ## 7. Operational alerts driven by the audit stream Real-time consumers (Redis stream fan-out) raise alerts on patterns: - ≥ 5 `login.failure` for one account in 10 min → lock + notify user. - Any `review.overridden`, `impersonation.started`, `data_export.performed` → admin channel ping. - `permission.denied` bursts from one token → token quarantined pending review. - Hash-chain verification failure → page on-call immediately (severity: critical).