# ADR-0003: PostgreSQL as the Single Source-of-Truth Datastore

- **Status:** Accepted
- **Date:** 2024-06-01
- **Deciders:** FablePool core team
- **Related:** ADR-0007 (versioning), ADR-0011 (Meilisearch), ADR-0013
  (audit log), `docs/architecture/02-data-model.md`

## Context

FablePool's data is deeply relational (users ↔ roles ↔ content ↔ versions ↔
reviews ↔ attempts) but also contains large structured documents: problem
and course version payloads are JSON conforming to the schemas in
`docs/schemas/`. We need:

- strong transactional guarantees across workflow transitions (e.g.
  "publish version N, retire version N-1, write audit record" must be
  atomic);
- efficient storage and querying of JSON payloads (filter problems by
  answer type, schema version, widget usage);
- full-text search good enough as a fallback when the search service is
  down;
- a single store the small ops team can actually run, back up, and restore.

## Decision

**PostgreSQL (16+) is the sole source of truth.** All entities from
`02-data-model.md` live in Postgres. Specifically:

1. **Version payloads as `JSONB`** on `ProblemVersion.content` and
   `CourseVersion.content`, validated against the JSON Schemas at the
   application layer before write, with a `CHECK` on `schema_version` and
   GIN indexes on the keys we filter by (`answer.type`, `widgets[].id`).
2. **Workflow state transitions inside transactions** with `SELECT … FOR
   UPDATE` on the parent entity row, so concurrent review actions cannot
   race (see `06-review-workflow.md` §"Concurrency").
3. **Audit log as an append-only Postgres table** (details in ADR-0013) —
   no separate event store in the MVP.
4. **Meilisearch is a disposable projection** (ADR-0011): it can be rebuilt
   from Postgres at any time and is never written to as a source of truth.
5. **Constraints over conventions:** foreign keys ON everywhere, partial
   unique indexes for invariants like "at most one published version per
   problem" (`UNIQUE (problem_id) WHERE status = 'published'`).

## Alternatives Considered

- **MongoDB / document store** for version payloads. JSONB gives us 90% of
  the document ergonomics while keeping joins, transactions, and one backup
  story. Running two databases doubles ops burden for a volunteer-friendly
  project. Rejected.
- **MySQL/MariaDB.** Viable, but Postgres's JSONB indexing, partial unique
  indexes, transactional DDL, and `tsvector` fallback search are direct
  fits for our requirements. Rejected.
- **SQLite for small self-hosts.** Attractive for tiny deployments, but
  concurrent-writer limitations conflict with attempt logging and Celery
  workers. We do not promise SQLite support; Docker Compose makes Postgres
  cheap even for small installs. Rejected.
- **Event sourcing as primary model.** Our versioning model already gives
  us immutable history where it matters (content); full event sourcing adds
  large complexity for marginal benefit. Rejected (revisit only if audit
  requirements outgrow ADR-0013).

## Consequences

- ✅ One database to operate, back up (`pg_dump` + WAL archiving), and
  restore; all invariants enforceable in one place.
- ✅ Search index and caches are rebuildable projections — losing them is an
  inconvenience, not data loss.
- ⚠️ Large JSONB payloads mean we must watch TOAST/row sizes; payloads above
  1 MB are rejected at the API layer (large assets belong in object storage,
  ADR-0018).
- ⚠️ Application-layer JSON Schema validation must be airtight, since the
  database cannot fully validate document shape. Validation is centralized
  in one module and covered by schema round-trip tests in CI.