# ADR 0014: Centralized Policy Layer for RBAC Enforcement

- **Status:** Accepted
- **Date:** 2025-01-15
- **Deciders:** Core architecture team
- **Related:** docs/architecture/05-rbac-permissions.md, ADR 0001 (Django/DRF), ADR 0013 (audit log)

## Context

The permission model (docs/architecture/05) defines five roles — learner, contributor, reviewer, moderator, admin — with a matrix of capabilities that is **not purely role-based**: many rules are relational ("authors may edit their own drafts," "reviewers may not review their own submissions," "moderators may lock threads but not edit content"), state-dependent ("published versions are immutable for everyone"), and occasionally object-scoped (per-topic reviewer assignments are a likely post-MVP extension).

The enforcement question: *where and how do permission checks live* so that they are consistent, testable, auditable, and impossible to forget?

Failure modes we must design against:

- **Scattered checks**: `if request.user.is_staff` sprinkled through views drifts from the documented matrix and rots.
- **Missing checks on new endpoints**: the default must be deny, not "whatever the developer remembered."
- **Frontend/backend divergence**: the UI hides a button while the API still allows the action (or vice versa).
- **Unauditable decisions**: when a moderator's action is disputed, we need to know what policy allowed it.

Options evaluated:

1. **DRF per-view permission classes only** — idiomatic, but logic fragments across dozens of classes and duplicates into querysets.
2. **Centralized policy module ("policy layer")** — one Python module per domain area defines predicates (`can_edit_problem(actor, problem) -> Decision`); views, serializers, and services all call the same predicates.
3. **External policy engine (OPA/Cedar/Casbin)** — policies as data, evaluated by an engine.
4. **Database row-level security (PostgreSQL RLS).**

## Decision

We implement a **centralized policy layer** in the Django backend, surfaced through thin DRF adapters. Concretely:

1. **One source of truth: `apps/authz/policies/`.** Each domain (`problems.py`, `courses.py`, `reviews.py`, `moderation.py`, `discussions.py`, `admin.py`) exposes pure functions of the form:
   ```python
   def can_submit_for_review(actor: Actor, problem: Problem) -> Decision: ...
   ```
   `Decision` is a small value object: `allowed: bool`, `code: str` (machine-readable reason, e.g. `"not_author"`, `"version_not_draft"`, `"self_review_forbidden"`), and `message: str`. Policies are pure (no I/O beyond the objects passed in), making them trivially unit-testable. **The permission matrix in docs/architecture/05 is mirrored 1:1 by a test suite** that instantiates every (role, action, state) cell and asserts the policy result — the doc and the code cannot silently diverge.
2. **Default deny.** A project-level DRF permission class requires every view to declare its policy mapping explicitly; CI includes a checker that fails if any registered route lacks one. Anonymous access is an explicit grant (`AllowAnonymous` on public read endpoints), never a fallthrough.
3. **Three enforcement surfaces, one predicate:**
   - **View layer:** DRF permission classes adapt `Decision` to 403 responses, returning `code` in the error body so clients can react programmatically.
   - **Queryset scoping:** for list endpoints, each policy module provides a matching `visible_problems(actor) -> QuerySet` scope so unauthorized objects are filtered out, not just 403'd on access — preventing existence leaks of drafts/unpublished content.
   - **Service layer:** state-transition services (publish, approve, takedown) re-check the policy before acting, so internal callers (Celery tasks, management commands) cannot bypass enforcement by skipping the view.
4. **Roles are per-user grants stored in the database** (`user_roles` join table, not Django's `is_staff`/groups), with role *grants and revocations* always audited (ADR 0013). Role checks inside policies go through a cached `actor.roles` snapshot loaded once per request.
5. **Capability exposure to the frontend.** Serializers for problems, courses, reviews, and threads include a `permissions` object (e.g. `{"can_edit": true, "can_submit_for_review": false, ...}`) computed from the same predicates. The Next.js UI renders affordances from this object and **never re-implements policy logic** — the frontend's role is display, the backend's is enforcement.
6. **Denials of sensitive actions are observable:** policy denials on moderation/admin endpoints emit a structured log event with the decision `code` (metric-counted, ADR 0020); they are not written to the audit log (which records actions taken, not actions refused) except for repeated-denial anomaly detection handled in logging.

## Alternatives Considered

- **DRF permissions only:** fine for trivial CRUD; our relational/state-dependent rules would force duplication between `has_object_permission` and queryset filters in every viewset, with no single artifact matching the documented matrix.
- **OPA/Cedar/Casbin:** real benefits (policy-as-data, side-loading), but: another runtime dependency for self-hosters, a second language (Rego/Cedar) raising the contributor barrier, and awkward access to ORM state for relational rules (we'd serialize half the object graph into engine input). Our policy needs are rich but not *dynamic* — policies change with code releases, not at runtime — so an engine's main advantage is unused. The policy-layer interface keeps this migration possible if multi-tenant/custom-policy needs emerge.
- **PostgreSQL RLS:** excellent defense-in-depth for read scoping, but cannot express action semantics ("may submit for review"), complicates connection pooling (per-request role switching), and hides logic from application tests. We note RLS as a *possible future addition* for draft-visibility hardening, layered under — not replacing — the policy layer.

## Consequences

**Positive**
- One testable artifact mirrors the documented matrix; the matrix-conformance test suite makes permission regressions a CI failure, not a security incident.
- Queryset scoping kills the "drafts leak through list endpoints" class of bugs structurally.
- The serializer `permissions` object eliminates frontend/backend capability drift.

**Negative / Accepted risks**
- Per-object `permissions` computation adds work to serialization; mitigated by the predicates being pure in-memory checks over already-fetched objects, and by omitting the object on bulk list endpoints where the UI doesn't need it.
- Centralization makes `apps/authz` a high-traffic module for contributors; mitigated with CODEOWNERS review requirements on that path.
- Pure-Python policies cannot be changed without a deploy. Accepted for MVP; recorded as the trigger condition for revisiting a policy engine.

**Follow-ups**
- Backend milestone: implement `Decision`, the route-coverage CI checker, and the matrix-conformance test generator that parses docs/architecture/05's table.