# ADR 0013: Append-Only Audit Log in PostgreSQL with Hash Chaining

- **Status:** Accepted
- **Date:** 2025-01-15
- **Deciders:** Core architecture team
- **Related:** docs/architecture/08-audit-log.md, ADR 0003 (PostgreSQL), ADR 0014 (RBAC enforcement)

## Context

The platform handles community trust at scale: moderation actions, review approvals, role grants, content takedowns, and forks with attribution obligations. Disputes are inevitable ("my problem was rejected unfairly," "who deleted this course?", "this moderator abused their powers"). We need an audit trail that is:

1. **Complete** for security- and governance-relevant actions (the event catalog in docs/architecture/08).
2. **Tamper-evident** — even admins and operators should not be able to silently rewrite history; if they do, it must be detectable.
3. **Queryable** by moderators/admins (filter by actor, target, action, time range) without a data-warehouse project.
4. **Cheap to operate** for small self-hosted instances.
5. **Privacy-compatible** — GDPR-style erasure requests must be satisfiable without destroying the integrity of the log.

Options evaluated:

1. **Plain `AuditLog` table** in PostgreSQL, written by application code.
2. **Plain table + database-level hardening + hash chaining** (PostgreSQL, application-written, integrity-verifiable).
3. **Database triggers** auto-capturing all row changes (e.g. `pgaudit`-style or trigger-based shadow tables).
4. **External append-only store** (Kafka topic, object-storage WORM logs, or a dedicated audit service).

## Decision

We implement the audit log as an **application-written, append-only PostgreSQL table with per-row hash chaining**, per the schema in docs/architecture/08. Key decisions:

1. **Application-level event semantics, not row-diff capture.** Audit entries record *domain events* (`review.approved`, `role.granted`, `problem.takedown`) with actor, target, reason, and a structured `context` JSONB — not raw row diffs. Triggers capture *what* changed but not *why* or *who at the application level*; domain events are what disputes are adjudicated on. (Row-level change history for content is already covered by the immutable version model, ADR 0007.)
2. **Append-only enforcement in the database**, not just convention:
   - The application's primary database role is granted `INSERT` and `SELECT` on `audit_log` but **not `UPDATE` or `DELETE`**.
   - A belt-and-braces trigger raises an exception on any `UPDATE`/`DELETE` attempt, so even a misconfigured superuser session in a migration cannot casually mutate rows.
3. **Hash chaining for tamper evidence.** Each row stores `entry_hash = SHA-256(prev_hash || canonical_json(entry_fields))` where `prev_hash` is the previous row's hash (genesis rows use a fixed sentinel). Chaining is per-instance and sequential, serialized via the insert path described in docs/architecture/08. A `verify_audit_chain` management command walks the chain and reports the first divergence; a periodic Celery `batch` task verifies recent segments and publishes **chain checkpoints** (latest `entry_hash` + row count) to two external sinks: the structured application log stream and, when configured, an object-storage write-once file (ADR 0018). An attacker who can rewrite the table *and* both checkpoint sinks is outside our threat model.
4. **Synchronous writes in the same transaction** as the action being audited. An action that cannot be audited does not happen — the audit insert failing rolls back the action. This is deliberate: for governance events (low volume, high importance), consistency beats throughput. High-volume *behavioral* telemetry (page views, attempt streams) is explicitly **not** audit-log material and goes to ordinary analytics/event tables.
5. **Privacy via pseudonymization, not deletion.** Audit rows reference actors by immutable user ID. Account erasure removes/anonymizes the PII in the `users`/`profiles` tables; the audit log retains the opaque ID. The `context` JSONB must never contain free-form PII (enforced by a reviewed allowlist of context-builder helpers — application code cannot write arbitrary dicts into `context`). Where an event inherently involves text that may contain PII (e.g. a report reason), the log stores a reference to the moderated object, not a copy.
6. **Retention & archival:** default retention is indefinite for governance events. A configurable archival job exports rows older than N months to compressed JSONL in object storage (with chain hashes intact) and is allowed to drop them from the hot table **only** when archival mode is explicitly enabled — small instances just keep everything.

## Alternatives Considered

- **Plain table without hardening:** trivially mutable by any admin or buggy migration; fails tamper-evidence outright.
- **Trigger-based full change capture:** captures mechanism, not intent; explodes in volume; entangles audit semantics with schema evolution (every table change risks breaking triggers); and still mutable without the hardening we'd add anyway.
- **External append-only store (Kafka/WORM/dedicated service):** strongest immutability story, but adds a service category small instances cannot run, breaks the same-transaction consistency guarantee (point 4), and makes moderator-facing querying harder. The object-storage **checkpoint** mechanism gives us most of the external-anchor benefit at near-zero cost.

## Consequences

**Positive**
- Tamper-evident with detection tooling, while remaining a normal queryable PostgreSQL table — moderation UIs can filter/paginate it with ordinary ORM queries against indexed columns (`actor_id`, `target_type`+`target_id`, `action`, `created_at`).
- Same-transaction writes make the log trustworthy as a *complete* record of governance actions.
- Works identically on a hobbyist VPS and a large deployment.

**Negative / Accepted risks**
- Hash chaining serializes inserts of audit rows (per docs/architecture/08, via advisory lock). Governance events are low-volume (orders of magnitude below attempts), so this is acceptable; if it ever isn't, sharding the chain by event category is the documented escape hatch.
- Synchronous auditing adds one insert + one hash computation to governance actions. Negligible.
- True deletion-resistance against a hostile database superuser is impossible in-database; we provide *detection* (checkpoints), not *prevention*, and document that boundary honestly.

**Follow-ups**
- Implement the context-builder allowlist and `verify_audit_chain` command in the backend milestone.
- Add "audit chain verified" age as a monitored metric (ADR 0020).