# ADR-0003: PostgreSQL as the Single Source-of-Truth Datastore - **Status:** Accepted - **Date:** 2024-06-01 - **Deciders:** FablePool core team - **Related:** ADR-0007 (versioning), ADR-0011 (Meilisearch), ADR-0013 (audit log), `docs/architecture/02-data-model.md` ## Context FablePool's data is deeply relational (users ↔ roles ↔ content ↔ versions ↔ reviews ↔ attempts) but also contains large structured documents: problem and course version payloads are JSON conforming to the schemas in `docs/schemas/`. We need: - strong transactional guarantees across workflow transitions (e.g. "publish version N, retire version N-1, write audit record" must be atomic); - efficient storage and querying of JSON payloads (filter problems by answer type, schema version, widget usage); - full-text search good enough as a fallback when the search service is down; - a single store the small ops team can actually run, back up, and restore. ## Decision **PostgreSQL (16+) is the sole source of truth.** All entities from `02-data-model.md` live in Postgres. Specifically: 1. **Version payloads as `JSONB`** on `ProblemVersion.content` and `CourseVersion.content`, validated against the JSON Schemas at the application layer before write, with a `CHECK` on `schema_version` and GIN indexes on the keys we filter by (`answer.type`, `widgets[].id`). 2. **Workflow state transitions inside transactions** with `SELECT … FOR UPDATE` on the parent entity row, so concurrent review actions cannot race (see `06-review-workflow.md` §"Concurrency"). 3. **Audit log as an append-only Postgres table** (details in ADR-0013) — no separate event store in the MVP. 4. **Meilisearch is a disposable projection** (ADR-0011): it can be rebuilt from Postgres at any time and is never written to as a source of truth. 5. **Constraints over conventions:** foreign keys ON everywhere, partial unique indexes for invariants like "at most one published version per problem" (`UNIQUE (problem_id) WHERE status = 'published'`). ## Alternatives Considered - **MongoDB / document store** for version payloads. JSONB gives us 90% of the document ergonomics while keeping joins, transactions, and one backup story. Running two databases doubles ops burden for a volunteer-friendly project. Rejected. - **MySQL/MariaDB.** Viable, but Postgres's JSONB indexing, partial unique indexes, transactional DDL, and `tsvector` fallback search are direct fits for our requirements. Rejected. - **SQLite for small self-hosts.** Attractive for tiny deployments, but concurrent-writer limitations conflict with attempt logging and Celery workers. We do not promise SQLite support; Docker Compose makes Postgres cheap even for small installs. Rejected. - **Event sourcing as primary model.** Our versioning model already gives us immutable history where it matters (content); full event sourcing adds large complexity for marginal benefit. Rejected (revisit only if audit requirements outgrow ADR-0013). ## Consequences - ✅ One database to operate, back up (`pg_dump` + WAL archiving), and restore; all invariants enforceable in one place. - ✅ Search index and caches are rebuildable projections — losing them is an inconvenience, not data loss. - ⚠️ Large JSONB payloads mean we must watch TOAST/row sizes; payloads above 1 MB are rejected at the API layer (large assets belong in object storage, ADR-0018). - ⚠️ Application-layer JSON Schema validation must be airtight, since the database cannot fully validate document shape. Validation is centralized in one module and covered by schema round-trip tests in CI.