# Versioning & Forking Model **Status:** Stable draft for MVP implementation **Related:** [02-data-model.md](02-data-model.md), [03-content-format.md](03-content-format.md), [06-review-workflow.md](06-review-workflow.md), ADR-007, ADR-013 --- ## 1. Goals 1. **Every published artifact is a reviewed, immutable version.** Learners never see content that has drifted since review. 2. **Full history with rollback.** Any prior published version can be restored without data loss. 3. **Forkable & remixable.** Any user may fork published content into their own draft, with mandatory attribution chains, per CC BY-SA 4.0. 4. **Auditable provenance.** Given any version on the platform, we can answer: who wrote it, what changed, when it was reviewed, and what it was derived from. 5. **Cheap storage.** Versions are content-addressed and deduplicated; identical payloads are stored once. ## 2. Core concepts ### 2.1 Two-layer model: `Entity` + `EntityVersion` Each versionable artifact (Problem, Course) is split into: | Layer | Mutable? | Holds | |---|---|---| | **Entity** (`Problem`, `Course`) | Yes | Stable identity (UUID + slug), owner, fork lineage pointers, current pointers, lifecycle status, aggregate stats | | **Version** (`ProblemVersion`, `CourseVersion`) | **No — append-only** | Full content document (JSON, per the content format spec), version number, changelog entry, content hash, author, review linkage | The Entity carries three pointers: ``` Problem ├── draft_version_id → the single editable working copy (nullable) ├── published_version_id → what learners see (nullable until first publish) └── latest_version_id → most recent version of any state (for editors) ``` **Invariant V1:** `published_version_id` may only ever point to a version whose review state is `accepted`. This is enforced at the database level with a trigger/check on the publish transaction, not just in application code. **Invariant V2:** A version row, once its state leaves `draft`, is immutable. Edits to a submitted/accepted/published version always create a *new* version row. Draft versions may be updated in place (autosave) to avoid version-row explosion during authoring. ### 2.2 Version numbering - Versions are numbered with a monotonically increasing integer per entity: `v1, v2, v3, …` (column `version_number`, unique with `entity_id`). - We additionally expose a human-facing **semantic-ish label** derived automatically: - **major** bump when the answer specification, problem type, or grading logic changes (attempt history becomes non-comparable); - **minor** bump otherwise (prose fixes, hint improvements, tag changes). - The major/minor classification is computed by diffing the canonical JSON document (see §5) and stored as `change_class ∈ {major, minor}` on the version. It drives downstream behavior: - **major** → existing `ProblemAttempt` rows keep their `version_id` but are excluded from the entity-level success-rate statistics; spaced-repetition scheduling for affected learners is reset to "review soon". - **minor** → statistics carry forward. ### 2.3 Content addressing & deduplication Every version stores: ``` content_hash = sha256(canonical_json(document)) ``` `canonical_json` is RFC 8785 (JCS) canonicalization of the content document **excluding** volatile metadata fields (`createdAt`, `authorId`, `versionNumber`). Properties: - Re-submitting an unchanged document is detected (`content_hash` matches parent) and rejected with a friendly "no changes" error. - The hash is included in the export format (§7) so an imported OER bundle can be verified. - Large embedded media is never inlined; documents reference `MediaAsset` IDs, and media assets are themselves content-addressed in object storage (`assets/{sha256}/{filename}`), so forks and versions share media bytes automatically. ## 3. Version lifecycle ```mermaid stateDiagram-v2 [*] --> draft: create / fork / "edit published" draft --> draft: autosave (in-place) draft --> submitted: author submits submitted --> in_review: reviewer claims in_review --> changes_requested: review outcome changes_requested --> draft: author revises (NEW version row,\nparent = this version) in_review --> accepted: review quorum met accepted --> published: publish transaction submitted --> withdrawn: author withdraws in_review --> rejected: review outcome published --> superseded: newer version published published --> retracted: moderator action ``` Notes: - The review states live on the version (`ProblemVersion.state`); the *entity* status is derived (`Problem.status = published` iff `published_version_id IS NOT NULL` and not retracted). - `superseded` versions remain readable at their permalink (`/problems/{slug}/v/{n}`) for attribution and attempt-history integrity, with a banner linking to the current version. - `retracted` is a moderation action (copyright, plagiarism, dangerous content). Retraction tombstones the rendered content but **retains the row** for audit; the audit log records the reason (see [08-audit-log.md](08-audit-log.md)). ### 3.1 Editing published content When a contributor with edit rights clicks "Edit" on a published problem: 1. If `draft_version_id` is null, a new draft version is created as a deep copy of the published version, with `parent_version_id = published_version_id`. 2. If a draft already exists, the editor opens it (one active draft per entity per branch — MVP has a single branch; see §8 Future Work). 3. The published version is untouched and continues serving learners until the new draft survives review and is published. ### 3.2 Publish transaction Publishing is a single serializable transaction: ```sql BEGIN; -- preconditions checked with row locks SELECT ... FROM problem_versions WHERE id = :vid AND state = 'accepted' FOR UPDATE; UPDATE problem_versions SET state = 'superseded' WHERE id = (SELECT published_version_id FROM problems WHERE id = :pid); UPDATE problem_versions SET state = 'published', published_at = now() WHERE id = :vid; UPDATE problems SET published_version_id = :vid, draft_version_id = NULL WHERE id = :pid; INSERT INTO audit_log (...); -- publish event COMMIT; ``` Post-commit (async, idempotent): reindex in Meilisearch, invalidate render cache, notify subscribers/watchers, recompute course integrity if the problem is embedded in lessons (§6). ## 4. Rollback Rollback = publish an older accepted version again. We deliberately do **not** mutate history. Procedure (`POST /api/v1/problems/{id}/rollback` with `target_version`): 1. Permission check: entity maintainer + (moderator approval **or** reviewer quorum if the target differs by a `major` change class) — see RBAC matrix. 2. A new version row `v(n+1)` is created whose content is byte-identical to the target version, `parent_version_id = current published`, `rollback_of = target_version_id`, changelog auto-filled: *"Rollback to v3: "*. 3. Because the target was already accepted and content is hash-identical, the new version is **fast-tracked**: it inherits `state = accepted` without a new review cycle (Invariant V1 still holds — the content *was* reviewed). This fast track is recorded in the audit log. 4. Normal publish transaction. This keeps `version_number` strictly increasing and makes "what was live on date X" answerable from `published_at` ranges alone. ## 5. Diffs & changelogs - Every non-draft version requires a human-written changelog entry (≥ 10 chars, enforced). - Machine diffs are computed on demand from the canonical JSON using a structural JSON diff (we diff at the block level of the content document, so the review UI shows "Statement block changed / Hint 2 added / answer tolerance 0.01 → 0.05" rather than a raw text diff). MDX `body` fields within a block are diffed as text with word-level granularity. - Diffs are not stored; they are derivable and cached in Redis keyed by `(hash_a, hash_b)`. ## 6. Course versioning & the pinning rule Courses reference problems and lessons. The key design decision: > **A published `CourseVersion` pins exact `ProblemVersion` IDs**, not floating entity references. Rationale: a course was reviewed *as a whole*; silently swapping a problem under it would violate the "published = reviewed" guarantee. Mechanics: - While editing a course draft, lesson blocks reference problems by **entity ID** with `track: "latest_published"` (default) or `pin: `. - At course **submit** time, all `latest_published` references are resolved and frozen into concrete version IDs inside the `CourseVersion` document. Reviewers review the frozen set. - When a pinned problem later publishes a new version: - `minor` change class → a background job offers course maintainers a one-click **"refresh pins"** action that creates a course draft with updated pins. Because content is otherwise unchanged and the constituent versions are accepted, the resulting course version is fast-track eligible (same rule as rollback §4.3). - `major` change class → refresh requires full course review. - If a pinned problem is **retracted**, the course is flagged `integrity: degraded`, maintainers and moderators are notified, and the lesson renders an accessible placeholder ("This problem was removed: ") until a fixed course version ships. Courses are never auto-unpublished by upstream retraction except for legal takedowns. ## 7. Forking & attribution ### 7.1 What forking creates `POST /api/v1/problems/{id}/fork` (or via UI): 1. Creates a **new Entity** owned by the forker, with: - `forked_from_entity_id` = source entity - `forked_from_version_id` = the *specific published version* forked (critical: the source may change later) 2. Creates a draft version whose content is a deep copy, with an injected `attribution` block (see §7.2). 3. Increments `fork_count` on the source and records a `content.forked` audit event. Forks then live an independent life: their versions are reviewed and published on their own. There is **no automatic sync** from upstream (MVP); the UI shows "upstream has N newer versions" with a structural diff view to help manual incorporation. **Fork preconditions:** only `published` versions of public entities can be forked. Drafts and retracted content cannot. Retracting an upstream does *not* retract forks (BY-SA permits derivatives), but plagiarism/copyright retractions cascade a moderation review onto direct forks of the retracted version. ### 7.2 Attribution chain (license-critical) CC BY-SA 4.0 requires attribution and share-alike. The content document carries a machine-readable, append-only attribution array: ```json "attribution": { "license": "CC-BY-SA-4.0", "chain": [ { "entityId": "prb_8f3a…", "versionId": "pv_19c2…", "title": "Modular Arithmetic Warm-up", "authors": [{"userId": "usr_77…", "displayName": "ada"}], "url": "https://fablepool.example/problems/modular-arithmetic-warmup/v/4", "forkedAt": "2025-06-01T12:00:00Z" } ] } ``` Rules: - The chain is **server-managed and immutable from the editor** — the content editor cannot remove or alter chain entries; the API rejects documents whose chain doesn't extend the parent's chain. This is validated against the fork lineage in the database, not trusted from the client. - Renderers display the chain in a standard "Derived from…" footer on every published derivative. - Display names are snapshotted at fork time (so renames don't break attribution) but link to live profiles; users who exercise erasure (GDPR) have their snapshot replaced by "a FablePool contributor" + stable opaque ID, which satisfies BY-SA's allowance for reasonable attribution. - Exports (§8) include the full chain. ### 7.3 Remixing across entities A course may embed forked problems freely. A problem that *combines* material from multiple sources lists multiple roots in its chain; the editor offers "import block from…" which appends the source to the chain automatically. Manual chain additions (for content imported from outside FablePool, e.g., another CC BY-SA OER) are allowed via a dedicated "external attribution" entry type with `url` + `license` fields, validated for license compatibility (only BY-SA-compatible licenses accepted: CC0, CC-BY, CC-BY-SA). ## 8. Import / export of versioned OER bundles Export (`GET /api/v1/courses/{id}/export?version=n`) produces a zip bundle: ``` course-bundle/ ├── manifest.json # bundle format version, content hashes, license, attribution ├── course.json # the CourseVersion document ├── problems/ │ ├── prb_8f3a….json # each pinned ProblemVersion document │ └── … ├── assets/ │ └── {sha256}.{ext} # referenced media, content-addressed └── LICENSE.txt # CC BY-SA 4.0 full text ``` - `manifest.json` lists every file with its sha256, enabling integrity verification on import. - Import creates new entities with the bundle's attribution chains preserved and extended (an import is modeled as a fork from an external source). Imported content enters the normal `draft → review → publish` pipeline — **imports are never auto-published.** - The bundle format is itself versioned (`"bundleFormat": 1`); importers must reject unknown major format versions. ## 9. Storage & performance notes - Version documents are stored in a `JSONB` column; PostgreSQL TOAST compression handles prose well. At an estimated 30 KB average document and 10 versions/problem, 100k problems ≈ 30 GB — acceptable for MVP without delta encoding. Delta storage is deliberately deferred (ADR-013) because it complicates rollback and export for marginal savings. - Hot path (render published version) touches one indexed row: `problem_versions WHERE id = problems.published_version_id`, cached in Redis keyed by `content_hash` (cache never goes stale for immutable content; invalidation is only pointer-level). ## 10. Future work (explicitly out of MVP scope) - Multiple named branches per entity (translation branches will be the first need — see [09-accessibility-i18n-bandwidth.md](09-accessibility-i18n-bandwidth.md)). - Upstream→fork merge assistance (3-way structural merge). - Delta-encoded version storage. - Cross-instance federation of forks (ActivityPub-style), kept in mind in the ID scheme (globally unique UUIDs, absolute URLs in attribution chains) but not built.