# 03 — Content Storage Format Specification This document specifies the structured format for all educational content: problems, hints, solutions, courses, and widget references. The format is **MDX-in-JSON**: a JSON envelope (validated by the schemas in `schemas/`) whose prose fields contain a constrained MDX dialect. Raw HTML is **never** stored or rendered. Design goals, in priority order: 1. **Safety** — content is untrusted input; the format must be sanitizable and renderable without script execution in the main origin. 2. **Portability** — documents must survive export/import (OER bundles), forking across instances, and offline rendering. 3. **Diff-ability** — reviewers need meaningful diffs between versions. 4. **Accessibility** — the format forces alt text, captions, and plain-language fallbacks at the schema level, not as an afterthought. ## 1. The MDX dialect ("PolyMDX") PolyMDX is CommonMark + GFM tables/strikethrough/task-lists, plus: - **Math:** `$...$` inline and `$$...$$` block (KaTeX). No raw `\( \)`. - **A fixed registry of trusted components** (capitalized JSX tags). Anything not in the registry fails validation at save time. - **No** raw HTML, no `import`/`export` statements, no JS expressions (`{...}` attribute values are limited to JSON literals), no `