# 00 — Overview, Goals, and Technology Selection ## 1. Problem statement Self-hosted music servers (Navidrome, Airsonic, Jellyfin) assume the music library is a local filesystem. Users who keep their collections in object storage or on WebDAV shares (Nextcloud, Hetzner Storage Box, rclone-served remotes) must either sync the library locally or interpose a FUSE mount — both fragile and operationally heavy. This project streams **directly from S3 and WebDAV**, treating remote storage as the source of truth, with a disciplined local cache for metadata, artwork, hot audio segments, and transcodes. ## 2. Goals (in scope across all milestones) - G1. Library sources on **S3-compatible** object storage (AWS S3, MinIO, Backblaze B2 S3 API, Cloudflare R2, Wasabi) and **WebDAV** (RFC 4918), including WebDAV servers behind HTTP Basic/Digest auth and self-signed TLS. - G2. **Subsonic API compatibility** (target: Subsonic API **1.16.1**, the level Navidrome implements) so existing clients work unmodified. - G3. A **native REST API** (cleaner, token-based, cursor-paginated) used by the bundled web UI and the Android app. - G4. **Chromecast**: custom receiver app (CAF v3) + sender support in web UI and Android app, including casting transcoded streams and queue handoff. - G5. **Android app**: browse, search, stream, offline downloads, Cast sender, Android Auto media browser surface. - G6. **Recommendation engine**: "radio mode" that auto-enqueues the next track from the current track plus the user's last *N* plays (*N* user-configurable, default 10, range 1–100). - G7. **Transcoding** via ffmpeg with per-player/per-user profiles (opus/mp3/aac, bitrate caps), and raw passthrough when the client supports the source codec. - G8. **Multi-user** with per-user libraries, playlists, play history, favourites, and ratings; admin role for server and source management. ## 3. Non-goals (explicitly out of scope) - NG1. Podcast/audiobook-specific features (chapters, episode feeds). The data model does not preclude them but no milestone delivers them. - NG2. Music *purchasing* or any DRM handling. - NG3. Multi-node horizontal scaling of a single server instance. Target scale is the self-hosting range: ≤ 50 concurrent streams, ≤ 1M tracks. The design keeps state in PostgreSQL/SQLite + an object cache so clustering is not foreclosed, but it is not engineered for in v1. - NG4. Federated/social features. - NG5. iOS app (Subsonic-compatible iOS clients cover this; revisit later). ## 4. Glossary | Term | Meaning | |---|---| | **Source** | A configured remote storage endpoint (one S3 bucket+prefix, or one WebDAV URL+path). | | **Library** | A logical music collection backed by exactly one Source. Users are granted access per-library. | | **Object key** | Driver-native identifier of a file within a Source (S3 key or WebDAV href path). | | **Media file** | A scanned audio file row; belongs to a Library; carries the object key, ETag/mtime, and extracted tags. | | **Track / Album / Artist** | Canonical musical entities derived from media files during scanning. | | **Scan** | The process of listing a Source, diffing against the DB, and (re)extracting tags for new/changed objects. | | **Player** | A registered playback endpoint for a user (web session, Android app instance, Subsonic client, Cast device), carrying a transcoding profile. | | **Radio mode** | Auto-enqueue driven by the recommendation engine. | | **Window (N)** | The number of most-recent plays the recommendation engine considers. | ## 5. Technology selections (normative for implementation milestones) | Concern | Selection | Rationale (non-normative) | |---|---|---| | Server language | **Go ≥ 1.22** | Single static binary (self-hosting ergonomics), mature S3/WebDAV/ffmpeg ecosystems, same niche as Navidrome. | | HTTP router | `chi` v5 | Stdlib-compatible, middleware-friendly, no framework lock-in. | | Database | **SQLite (default)** and **PostgreSQL ≥ 14** behind one schema | SQLite for zero-config self-hosting; Postgres for larger installs. Migrations via `golang-migrate`. All SQL in this package is provided in both dialects. | | S3 client | AWS SDK for Go v2 (`aws-sdk-go-v2`) | Path-style + virtual-host addressing, custom endpoints for MinIO/R2, presigning, `ListObjectsV2` pagination. | | WebDAV client | `golang.org/x/net/webdav` types + hand-rolled client over `net/http` | Existing Go WebDAV *clients* are thin; PROPFIND/partial-GET needs are narrow and specified fully in `03-storage-abstraction.md`. | | Tag extraction | `dhowden/tag` for common formats; ffprobe fallback | Pure-Go fast path; ffprobe covers exotic containers and gives duration/bitrate authoritatively. | | Transcoding | **ffmpeg** ≥ 6 invoked as subprocess | Universally available, no cgo. | | Web UI | React 18 + TypeScript, Vite | Delivered in a later milestone; API contract here is what matters. | | Android | Kotlin, Media3 (ExoPlayer), Cast SDK v3, Jetpack Compose | See `08-android-app.md`. | | Cast receiver | CAF Receiver SDK v3 (HTML/JS) | See `06-chromecast.md`. | | Recommendation features | Tag/metadata features + optional audio embedding via ffmpeg-decoded PCM and an ONNX model (`onnxruntime` via subprocess sidecar) | Two-phase rollout, see `07-recommendation-engine.md`. | ## 6. Quality attributes & budgets These are acceptance thresholds future milestones are tested against. - **Time-to-first-byte (cached stream):** ≤ 150 ms server-side. - **Time-to-first-byte (cold S3, same region):** ≤ 600 ms server-side (one ranged GET, no full-object fetch before responding). - **Seek latency:** a seek MUST translate to a single ranged read against cache or backend; no re-download from offset 0. - **Scan throughput:** ≥ 50 objects/sec metadata-diff on S3 (listing-driven, tag fetch only for new/changed objects); WebDAV bounded by PROPFIND depth strategy in `03-storage-abstraction.md`. - **Memory:** steady-state RSS ≤ 256 MiB at 10 concurrent transcoded streams (excluding ffmpeg children). - **Recommendation latency:** next-track decision ≤ 100 ms p95 against a 500k-track library (precomputed features, in-memory ANN index). ## 7. Document conventions - All Mermaid diagrams are authoritative for component boundaries; prose elaborates them. - All identifiers in the data model are **UUIDv7** (time-ordered) stored as `TEXT` in SQLite and `uuid` in PostgreSQL, generated server-side. - All timestamps are UTC, stored as `TIMESTAMPTZ` (Postgres) / ISO-8601 `TEXT` (SQLite), serialized in APIs as RFC 3339.