# ADR 0018: S3-Compatible Object Storage (MinIO in Dev, Any S3 API in Prod) - **Status:** Accepted - **Date:** 2025-01-15 - **Deciders:** Core architecture team - **Related:** docs/architecture/03 (MediaAssets), docs/architecture/09 (low-bandwidth), ADR 0009 (widget bundles), ADR 0013 (audit checkpoints/archives) ## Context The platform stores several classes of binary/blob data that do not belong in PostgreSQL: - **MediaAssets:** images, diagrams (SVG), audio for content; uploaded by contributors, referenced from MDX documents by asset ID. - **Widget bundles:** built, content-hashed JS/HTML bundles served to the sandbox origin (ADR 0009). - **OER export bundles:** versioned course export archives (zip/tar of MDX+JSON+assets). - **Operational artifacts:** audit-log checkpoints and archives (ADR 0013), database backup dumps. Requirements: works identically for a laptop dev environment, a single-VPS self-host, and a cloud deployment; supports direct-to-storage uploads (large files shouldn't stream through Django workers); cacheable/CDN-frontable public reads for the low-bandwidth strategy; and no vendor lock-in (mission-level requirement for an open platform). Options: local filesystem volumes; S3-compatible object storage (the S3 API as the contract); provider-specific SDK abstractions per backend. ## Decision We standardize on the **S3 API as the storage contract**, with these specifics: 1. **The S3 protocol is the interface; any conforming implementation is a valid backend.** Dev and default self-host: **MinIO** (single container in Compose). Production options documented and tested: AWS S3, Cloudflare R2, Backblaze B2, or self-hosted MinIO/Garage/SeaweedFS. The application is configured purely by endpoint/credentials/bucket settings — no provider-conditional code. 2. **Application integration via `django-storages[s3]` + `boto3`** for server-side operations, and **presigned URLs** for client-direct transfer: - **Uploads:** the API issues a presigned PUT (after validating content type, size limit, and the actor's permission to attach assets), client uploads directly to storage, then confirms; the backend verifies object existence + size + content hash before activating the MediaAsset record. Django workers never proxy file bytes. - **Private reads** (e.g. unpublished-content assets, export bundles): short-lived presigned GETs. - **Public reads** (assets of published content, widget bundles): served through a public bucket path behind the CDN/reverse-proxy with long-lived immutable cache headers — enabled by content-addressing (next point). 3. **Content-addressed keys for immutability:** object keys embed the SHA-256 of contents (`media/.`, `widgets//...`). Consequences: dedup for free, `Cache-Control: public, max-age=31536000, immutable` is *correct*, version rollbacks need no asset gymnastics (old versions reference old hashes, which still exist), and the audit/versioning story stays clean. A reference-counting GC task (Celery `batch`) deletes objects unreferenced by any version after a grace period — never eagerly, because immutable versions may reference them forever. 4. **Bucket layout:** four buckets (or prefixes, where the backend lacks cheap multi-bucket): `media` (public-read for published), `widgets` (public-read, served only via the sandbox origin), `exports` (private, presigned), `ops` (private: backups, audit archives, checkpoints; **object-lock/WORM enabled where the backend supports it** for the audit-checkpoint guarantees of ADR 0013). 5. **Image pipeline for low bandwidth (docs/architecture/09):** on upload confirmation, a Celery task generates WebP variants at standard widths (320/640/1280) stored under the asset's hash prefix; MDX image rendering emits `srcset` over the variants. SVGs are sanitized (whitelist-based, via a vetted sanitizer) at upload — SVG is an XSS vector and is treated as untrusted input. 6. **Self-host minimal mode:** for the smallest installs, the docs describe running MinIO on the same VPS; we deliberately do **not** maintain a filesystem-storage code path — one storage interface keeps the codebase honest, and a single MinIO container is cheap. ## Alternatives Considered - **Local filesystem (Django `FileSystemStorage`):** simplest day-one, but: no presigned direct uploads (bytes through workers), divergent dev/prod behavior the moment anyone deploys to cloud, backup/replication left entirely to the operator, and a second code path to test forever. Rejected. - **Provider-native SDKs with a storage abstraction layer:** we'd be writing our own S3-API equivalent abstraction; the S3 protocol already *is* the abstraction, with massive implementation availability. Rejected. - **Database-stored blobs (bytea/LO):** bloats backups, hammers the DB with media traffic, defeats CDN caching. Rejected without much ceremony. ## Consequences **Positive** - Identical storage semantics from laptop to cloud; zero provider lock-in; direct uploads keep API workers lean; content-addressing makes aggressive immutable caching safe and aligns perfectly with the immutable-version model (ADR 0007). - WORM-capable `ops` bucket strengthens audit-log tamper evidence. **Negative / Accepted risks** - Presigned-URL flows are more moving parts than form uploads (issue → upload → confirm); mitigated by a single well-tested upload helper in the frontend and an orphan-sweeper task for never-confirmed uploads. - "S3-compatible" is a spectrum; some backends lack object-lock or have eventual-consistency quirks. We document a **conformance checklist** (presigned PUT/GET, conditional requests, object-lock optional) and CI runs the storage test suite against MinIO as the reference implementation. - Reference-counting GC must be conservative; bugs there destroy content assets. Grace period + dry-run mode + deletion audit entries are mandatory. **Follow-ups** - Backend milestone: MediaAsset model with hash verification, presign endpoints, SVG sanitizer, variant pipeline, GC task with dry-run. - Targeted dependency versions: `django-storages ^1.14`, `boto3 ^1.34` (maintainers should re-verify presign parameter names against resolved versions).