# 02 — Data Model This document defines every persistent entity, its key fields, and its relationships. The implementation milestone must map these 1:1 to Django models. Conventions: - All tables have `id` (UUIDv7 — time-ordered, index-friendly), `created_at`, `updated_at` unless noted. - Soft deletes (`deleted_at`) only where content must remain referenceable (discussion comments, problems); hard deletes elsewhere, with audit log records preserving the tombstone. - Content documents are stored as **JSONB** validated against the schemas in `schemas/` at write time (see doc 03). ## 1. Entity-relationship diagram ```mermaid erDiagram User ||--|| Profile : has User }o--o{ Role : "via UserRole (scoped)" User ||--o{ ProblemAttempt : makes User ||--o{ Enrollment : has User ||--o{ Bookmark : saves User ||--o{ Notification : receives User ||--o{ Vote : casts User ||--o{ Report : files User ||--o{ AuditLog : "actor of" Problem ||--o{ ProblemVersion : "versions" Problem }o--|| User : "owned by" Problem }o--o| Problem : "forked from (item)" ProblemVersion }o--o| ProblemVersion : "parent / forked-from (version)" ProblemVersion ||--o{ Hint : contains ProblemVersion ||--o{ Solution : contains Problem }o--o{ Tag : tagged Problem }o--|| Topic : "primary topic" Problem }o--o{ Problem : "prerequisite of" ProblemVersion ||--o{ Review : "reviewed in" Review ||--o{ ReviewComment : has Review }o--|| User : "by reviewer" Course ||--o{ CourseVersion : versions Course }o--|| User : "owned by" Course }o--o| Course : "forked from" CourseVersion ||--o{ Module : contains Module ||--o{ Lesson : contains Lesson }o--o{ Problem : "references (pinned to version)" Course }o--o{ Tag : tagged CourseVersion ||--o{ Review : "reviewed in" Enrollment }o--|| Course : "in" Enrollment ||--o{ Progress : tracks Progress }o--o| Lesson : "for" Progress }o--o| Problem : "for" DiscussionThread }o--o| Problem : "attached to" DiscussionThread }o--o| Course : "attached to" DiscussionThread ||--o{ DiscussionComment : has DiscussionComment }o--o| DiscussionComment : "reply to" Vote }o--|| User : by Report }o--|| User : "filed by" Notification }o--|| User : for MediaAsset }o--|| User : "uploaded by" WidgetPackage ||--o{ WidgetVersion : versions WidgetVersion ||--o{ Review : "reviewed in" Tag }o--o| Topic : "belongs to" Topic }o--o| Topic : "child of" ``` ## 2. Identity & access ### User | Field | Type | Notes | |---|---|---| | `username` | citext unique | public handle; rename allowed, old handle reserved 90 days | | `email` | citext unique | verified flag; never exposed via API | | `password_hash` | text | Argon2id; null if OAuth-only | | `status` | enum | `active`, `suspended`, `banned`, `deactivated` | | `email_verified_at`, `last_active_at` | timestamptz | | | `locale`, `timezone` | text | drive i18n + streak rollover bucket | ### Profile (1:1 with User) `display_name`, `bio` (sanitized markdown), `avatar_media_id → MediaAsset`, `links jsonb` (validated URL list), `reputation int` (denormalized; recomputed by job from `ReputationEvent`), `pronouns`, `preferred_subjects text[]`, visibility settings (`show_activity bool`, `show_progress bool`). ### Role / UserRole `Role` is a small fixed table (`learner`, `contributor`, `reviewer`, `moderator`, `admin`). `UserRole(user, role, scope_topic_id nullable, granted_by, granted_at, expires_at nullable)` — roles can be **topic-scoped** (e.g., reviewer for Physics only). Everyone is implicitly a learner; see doc 05. ### ReputationEvent (append-only) `user_id, kind` (`problem_published`, `review_completed`, `solution_upvoted`, `flag_upheld_against`, …)`, points int, subject_type, subject_id, created_at`. Reputation totals are always recomputable from this ledger. ## 3. Content: problems ### Problem (identity object — almost no content) | Field | Type | Notes | |---|---|---| | `slug` | citext unique | human URL; redirects kept on rename | | `owner_id` | FK User | current maintainer (transferable) | | `primary_topic_id` | FK Topic | | | `published_version_id` | FK ProblemVersion nullable | **the** single source of learner-visible content | | `head_version_id` | FK ProblemVersion | latest draft-line version | | `forked_from_problem_id` | FK Problem nullable | item-level fork pointer | | `forked_from_version_id` | FK ProblemVersion nullable | exact version forked | | `visibility` | enum | `public`, `unlisted`, `private_draft` | | `is_locked` | bool | moderator content-freeze | | `license` | text | constant `CC-BY-SA-4.0` (field exists for future import compat) | | `deleted_at` | timestamptz nullable | | ### ProblemVersion (immutable after creation) | Field | Type | Notes | |---|---|---| | `problem_id` | FK | | | `number` | int | monotonic per problem, unique together | | `parent_version_id` | FK self nullable | version DAG (linear within an item, cross-item via fork) | | `state` | enum | `draft`, `submitted`, `in_review`, `changes_requested`, `accepted`, `rejected`, `published`, `superseded`, `withdrawn` (doc 06) | | `document` | jsonb | full content doc per `schemas/problem.schema.json` | | `document_schema_version` | int | for migration of old documents | | `answer_spec` | jsonb | **stored separately** from `document` so the API can serve `document` without ever risking answer leakage | | `changelog` | text | required non-empty on submit | | `authored_by` | FK User | author of this version (≠ owner for forks/edits) | | `content_hash` | text | SHA-256 of canonical JSON (document + answer_spec); used for dedupe, plagiarism, integrity | | `difficulty` | smallint 1–5 | | | `estimated_minutes` | smallint | | | `published_at`, `superseded_at` | timestamptz nullable | | Immutability is enforced at three layers: model `save()` guard, DB trigger rejecting `UPDATE` on `document`/`answer_spec` once `state != 'draft'` (drafts may be replaced only by *new* version rows after first submit), and the audit log. ### Hint / Solution Stored both inside `document` (canonical, versioned) **and** as projected rows for querying/permissions: - `Hint(problem_version_id, ordinal, body_mdx, penalty_percent smallint)` - `Solution(problem_version_id, ordinal, body_mdx, kind enum('official','community'), authored_by)` — community solutions attach to the *published version* and go through lightweight review (doc 06 §8). ### ProblemAttempt (append-only, high volume) `user_id, problem_id, problem_version_id` (the exact version attempted), `answer_payload jsonb` (normalized submitted answer), `is_correct bool`, `score numeric(5,4)` (partial credit for ordering/matching/code), `hints_used smallint`, `time_spent_ms int`, `gave_up bool`, `runner_result jsonb nullable` (code challenges: per-testcase pass/fail, resource usage — never raw runner internals), `client enum('web','api')`, `created_at`. Partitioned by month. ### Tag / Topic - `Topic(slug, name_i18n jsonb, parent_id nullable, ordinal)` — curated tree, staff-managed (e.g., *Math → Number Theory → Modular Arithmetic*). - `Tag(slug, name, topic_id nullable, status enum('active','merged','banned'), merged_into_id nullable, created_by)` — folksonomy, anyone can create; moderators merge/ban. `ProblemTag` and `CourseTag` join tables. - `ProblemPrerequisite(problem_id, prerequisite_problem_id, strength enum('required','recommended'))` with cycle check on write. ## 4. Content: courses ### Course (identity) — mirrors Problem `slug, owner_id, primary_topic_id, published_version_id, head_version_id, forked_from_course_id, forked_from_version_id, visibility, is_locked, deleted_at`. ### CourseVersion (immutable) `course_id, number, parent_version_id, state` (same machine as ProblemVersion), `document jsonb` (per `schemas/course.schema.json` — contains the full module/lesson structure), `changelog, authored_by, content_hash, estimated_hours`. ### Module / Lesson (projection tables) The course `document` is canonical; `Module` and `Lesson` rows are **write-time projections** for joins, progress FKs, and search — regenerated whenever a version is created: - `Module(course_version_id, ordinal, title, slug)` - `Lesson(module_id, ordinal, title, slug, kind enum('lesson','problem_set','quiz','project'), body_ref text)` — `body_ref` points into the document. - `LessonProblem(lesson_id, ordinal, problem_id, pinned_problem_version_id nullable, required bool)` — courses reference problems **pinned to a version** by default; `pinned=null` means "track published" (doc 04 §6). ### Enrollment / Progress - `Enrollment(user_id, course_id, course_version_id, status enum('active','completed','abandoned'), started_at, completed_at, unique(user_id, course_id))`. Learners stay on their enrolled version until they accept an upgrade prompt. - `Progress(user_id, subject_type enum('lesson','problem','module','course'), subject_id, state enum('not_started','in_progress','completed','mastered'), score numeric, attempts int, srs_ease numeric, srs_interval_days numeric, srs_due_at timestamptz, last_activity_at)` — unique on `(user_id, subject_type, subject_id)`. SRS fields used only for `problem` rows. - `Streak(user_id, current_days int, longest_days int, last_qualifying_date date, freezes_available smallint)` — qualifying activity = ≥1 graded attempt or lesson completion per local day. ## 5. Review & community ### Review `subject_type enum('problem_version','course_version','widget_version','solution')`, `subject_id`, `reviewer_id`, `state enum('open','approved','changes_requested','rejected','withdrawn','stale')`, `decision_note text`, `checklist jsonb` (the structured review rubric — correctness, originality, accessibility, difficulty calibration), `decided_at`. Constraint: reviewer ≠ version author and reviewer ≠ item owner (DB-level check via trigger; also enforced in the permission layer). ### ReviewComment `review_id, author_id, body_mdx, anchor jsonb nullable` (`{path: "document.statement", range: [start, end]}` — anchored to the MDX source of the version under review, stable because versions are immutable), `resolved_by nullable, resolved_at`. ### DiscussionThread / DiscussionComment - `DiscussionThread(subject_type enum('problem','course','lesson'), subject_id, title, kind enum('question','explanation','feedback','general'), is_locked, is_pinned, spoiler bool)` — `spoiler=true` threads are hidden until the learner solves the problem. - `DiscussionComment(thread_id, author_id, parent_id nullable, body_mdx, deleted_at, edited_at)` — max depth 3. ### Vote `user_id, subject_type enum('problem','solution','discussion_comment','hint')`, `subject_id, value smallint in (-1, 1)`, unique on `(user_id, subject_type, subject_id)`. Downvotes on problems require a reason category. Aggregates cached on subjects, recomputable. ### Bookmark `user_id, subject_type enum('problem','course','lesson','thread'), subject_id, collection_id nullable → BookmarkCollection(user_id, name, is_private)`. ## 6. Moderation & safety ### Report `reporter_id, subject_type` (any user-generated content type)`, subject_id, category enum('spam','plagiarism','incorrect_content','harassment','copyright','off_topic','other')`, `details text, state enum('open','triaged','resolved_upheld','resolved_dismissed','duplicate')`, `assigned_to nullable, resolution_note, resolved_by, resolved_at`, `plagiarism_evidence jsonb nullable` (similarity scan output: matched `content_hash`es, n-gram overlap scores, external-source URLs). ### ModerationAction `actor_id, action enum('hide_content','restore_content','lock','warn_user','suspend_user','ban_user','merge_tags','redact_pii')`, `subject_type, subject_id, reason text, report_id nullable, expires_at nullable`. Every action also emits an AuditLog entry (doc 08); this table is the *operational* record, the audit log is the *forensic* one. ## 7. Platform plumbing ### Notification `user_id, kind, payload jsonb, subject_type, subject_id, read_at nullable, emailed_at nullable`. Fan-out by Celery; per-kind user preferences in `NotificationPreference(user_id, kind, in_app bool, email bool)`. ### MediaAsset `uploader_id, kind enum('image','audio','attachment'), s3_key, content_hash (sha-256, dedupe), mime, bytes, width, height, alt_text_default, status enum('processing','ready','quarantined')`, `renditions jsonb` (AVIF/WebP/thumb keys). Uploads via presigned POST; processing strips EXIF and sanitizes SVGs (or rasterizes them — see doc 07 §7). ### WidgetPackage / WidgetVersion - `WidgetPackage(slug, owner_id, title, description, status enum('active','deprecated','banned'))` - `WidgetVersion(package_id, semver, state` (review state machine)`, manifest jsonb` (per `schemas/widget-manifest.schema.json`)`, bundle_s3_key, bundle_hash sha256, bundle_bytes, reviewed_capabilities text[], published_at)` — bundles are content-addressed and immutable (doc 07). ### AuditLog See doc 08 for the full design. Summary row shape: `seq bigserial, actor_id nullable, actor_role_snapshot text[], action, subject_type, subject_id, before_hash, after_hash, metadata jsonb, ip_trunc, request_id, prev_row_hash, row_hash, created_at`. Append-only, hash-chained, partitioned by month, `REVOKE UPDATE/DELETE` from the app DB role. ### OutboxSearch `subject_type, subject_id, op enum('upsert','delete'), enqueued_at, processed_at nullable, attempts int` — the transactional outbox of doc 01 §3.3. ### OERExport / OERImport `course_id, course_version_id, requested_by, s3_key, manifest jsonb, state, checksum` — tracked jobs for the bundle format in `schemas/oer-export.schema.json`. ## 8. Indexing & integrity highlights - `ProblemVersion`: unique `(problem_id, number)`; partial index on `state='published'`; GIN on `document` paths used by staff filtering. - `ProblemAttempt`: BRIN on `created_at` per partition; index `(user_id, problem_id, created_at desc)`. - `Progress`: index `(user_id, srs_due_at)` partial `WHERE srs_due_at IS NOT NULL`. - Exactly **one** published version per item enforced by partial unique index `UNIQUE(problem_id) WHERE state='published'` (and the same for courses). - All `subject_type/subject_id` polymorphic pairs are constrained by check constraints listing allowed types, and validated in the service layer (Django generic FKs are *not* used; explicit pairs keep queries indexable).