# 04 — Caching & Transcoding Strategy Status: **Normative**. Defines every cache tier, its keys, sizing, and eviction; the ffmpeg transcoding pipeline including seek semantics; and the HTTP caching contract of the streaming endpoints. `docs/03-storage-abstraction.md` is a prerequisite read. --- ## 1. Why a cache exists at all FablePool streams from *remote* object stores. Three latencies dominate UX: 1. **First-byte latency on play** — S3/WebDAV TTFB (50–600 ms) plus, for transcoded plays, ffmpeg spin-up. 2. **Seek latency** — fine for direct streams (range request), expensive for transcodes (no random access into a stream that doesn't exist yet). 3. **Scan cost** — solved by range reads (doc 03 §5), not caching. Backend egress also costs real money on AWS/B2/Wasabi. The cache design's goal: **each remote byte is fetched at most once per "hot" period**, and the recommendation engine's prefetch (§8) hides remote latency entirely for auto-played tracks. ## 2. Cache tiers (overview) | Tier | Contents | Store | Key | Default budget | |------|----------|-------|-----|----------------| | T1 Metadata | tags, albums, artists | Postgres/SQLite (not a cache — source-of-truth mirror) | — | — | | T2 Artwork | resized cover images | disk, content-addressed | `art:{media_version}:{size}` | 2 GiB | | T3 Audio (origin) | verbatim remote audio bytes, **chunked** | disk | `aud:{library_id}:{path_hash}:{version}/{chunk_no}` | 20 GiB | | T4 Transcode | completed transcode outputs | disk | `tc:{media_version}:{codec}:{bitrate}:{offset_bucket}` | 20 GiB | | T5 Hot index | LRU bookkeeping, in-flight dedup | RAM (+ persisted index file) | — | ~64 MiB | All disk tiers live under one configurable `cache_dir`, each in its own subdirectory, each with an independent size budget and an **LRU-with-cost** eviction policy (§5). `media_version` = the storage `Entry.Version` token, so a changed remote file automatically misses every tier — no invalidation protocol needed. ## 3. T2 — Artwork cache - Source: embedded art (`artwork_ref` offset/length range-read from the backend) or sidecar files (`cover.jpg` etc., preference order `cover > folder > front > album > `). - Resizing: server-side via `golang.org/x/image/draw` (CatmullRom) to the fixed ladder **64, 160, 300, 600, original**; output WebP (quality 82) via `github.com/HugoSmits86/nativewebp` fallback JPEG — exact encoder choice may be revisited at implementation; the *contract* is the size ladder and key scheme. - Served at `GET /api/v1/artwork/{id}?size=…` and Subsonic `getCoverArt` with `ETag: "{key}"`, `Cache-Control: public, max-age=31536000, immutable` (keys are content-versioned, so immutable is safe). ## 4. T3 — Audio origin cache (chunked) The origin cache is the single funnel for **all** server-mediated reads of remote audio (direct streaming of WebDAV-backed files, transcode input, over-budget scanner reads). - **Chunk size: 4 MiB.** A read of `[offset, offset+n)` maps to chunk numbers `offset/4Mi … (offset+n-1)/4Mi`. Missing chunks are fetched with *one coalesced* range request when contiguous. - **Read-ahead:** while a stream is consuming chunk *k*, the cache prefetches `k+1 … k+3` (12 MiB ahead) in the background — enough to ride out backend hiccups at any realistic audio bitrate. - **In-flight deduplication:** a `singleflight` map keyed by chunk key ensures N concurrent listeners of one track produce one backend fetch per chunk. - **Integrity:** each chunk file carries an xxhash64 footer; corrupt chunks are treated as misses. - Whole-file pinning: the scanner's full-download path (doc 03 §5.2) writes all chunks and *pins* them for 1 h so the likely subsequent first-play is free. - Backends with `ranges=false` (doc 03 §4.3) stream the full file into chunks sequentially; readers block on the chunk they need (download is sequential, playback is sequential — only seeks pay). ## 5. Eviction — LRU-with-cost Per disk tier: in-RAM LRU list of keys with sizes (rebuilt from an index file + directory walk at startup). On insert exceeding budget, evict from the cold end until under budget, skipping pinned and in-use entries. Two refinements: 1. **Cost-aware tiebreak:** within the coldest 10% of entries, evict cheapest- to-recreate first — origin chunks (re-fetchable in one request) before transcode outputs (CPU + full re-read to recreate). 2. **TTL floor:** nothing is evicted within 10 min of creation (protects a just-prefetched next track from a concurrent large scan's cache pressure). Budgets are admin-configurable; the admin UI shows per-tier hit rates (metrics: `cache_hits_total{tier=…}` etc., per doc 01 observability). ## 6. Streaming endpoint semantics (direct / no transcode) `GET /api/v1/stream/{trackId}` (and Subsonic `stream` with `format=raw` or no conversion needed): - **S3-backed + `CapPresign`:** respond `302 Found` to a presigned URL (TTL 15 min). The client then talks to S3 directly, including Range — the server is out of the data path. (Clients that can't follow redirects cross-origin — none of ours — could send `?noRedirect=1` to force proxy mode.) - **WebDAV-backed, or presign disabled:** proxy mode through T3: - Full `Range` support: single-range satisfied with `206` + `Content-Range`; multi-range requests are answered with the first range only (legal per RFC 9110 §14.2 trade-off; no audio client sends multi-range). - `ETag: "{media_version}"`, `Accept-Ranges: bytes`, `Cache-Control: private, max-age=0, must-revalidate` (the resource is auth-gated; version changes flow through the ETag). - `If-Range`/`If-None-Match` honored against `media_version`. - `Content-Type` from the scanner's detected format (never trusted from the backend). ## 7. Transcoding pipeline ### 7.1 When we transcode Decision per request: `(source codec, client-declared support, maxBitRate param, user/server policy)`: - Web player: FLAC/ALAC/APE/WV/DSF → transcode (browsers can't reliably play them); MP3/AAC/Opus/Vorbis → direct unless `maxBitRate` forces down. - Subsonic clients: honor the protocol's `format`/`maxBitRate` params. - Chromecast: receiver supports MP3/AAC/Opus/Vorbis/FLAC (≤96 kHz/24-bit) — matrix in `docs/06-chromecast.md`; only exotic codecs and high-rate FLAC transcode. - Android app: plays everything ExoPlayer does (≈ everything we host) — transcodes only for user "data saver" bitrate caps. ### 7.2 Profiles (normative) | Profile id | Codec | Container | Bitrates offered | |------------|-------|-----------|------------------| | `opus` | libopus | Ogg | 64 / 96 / 128 / 160 (default 128) | | `mp3` | libmp3lame | MP3 | 128 / 192 / 256 / 320 (default 192) | | `aac` | aac (native) | ADTS | 128 / 192 / 256 (default 192) | Default profile order when the client states no preference: `opus > aac > mp3` filtered by client capability. Sample rate: preserve up to 48 kHz; downsample above (Opus mandates 48 kHz anyway). Channel layout: preserve stereo/mono; downmix >2ch. ### 7.3 Process model ffmpeg (≥ 6.x required at runtime; presence + codec support verified at server startup, missing ffmpeg disables transcode features with a clear admin warning) is invoked as a subprocess per job: ``` ffmpeg -hide_banner -loglevel error \ [-ss {seekSeconds}] -i pipe:0 \ -map 0:a:0 -vn \ -c:a {codec} -b:a {bitrate}k [-ar 48000] [-ac 2] \ -f {container} pipe:1 ``` - **stdin** is fed from the T3 origin cache reader (so transcode input bytes are cached/deduped like any other read); **stdout** streams to the HTTP response *and* simultaneously to a T4 cache file (tee). If the client disconnects, the job continues to completion **iff** ≥ 50% was already produced, else it's aborted and the partial T4 file deleted. - Concurrency: semaphore `max_transcodes` (default `runtime.NumCPU()/2`, min 1). Excess requests queue 10 s, then `503` with `Retry-After: 5`. - Watchdog: no stdout progress for 30 s → kill, `ErrBackend` to client. ### 7.4 Seeking within transcodes — offset buckets A transcoded stream has no byte-addressable random access, so: - The streaming endpoint accepts **`timeOffset` (seconds)** for transcoded playback (Subsonic's `timeOffset` param; our REST API mirrors it). Byte `Range` on a transcoded response is rejected with `416` unless the full T4 object already exists (then it's a plain file serve with full Range support). - `timeOffset` is bucketed to 10-second granularity for cache efficiency: `offset_bucket = floor(timeOffset/10)*10`, and ffmpeg gets `-ss {offset_bucket}` (input seeking on the cached origin file is sample-accurate enough at bucket granularity; clients display position as `offset_bucket + elapsed`). - T4 key includes `offset_bucket`; the `offset_bucket=0` object is the only one eligible for long retention — nonzero buckets get a 1 h TTL cap (they're seek debris). - `Content-Length` is **omitted** for live transcodes (chunked transfer / HTTP2 data frames). Subsonic clients that need a length estimate get `X-Content-Duration` and the Subsonic `estimateContentLength=true` behavior (`bitrate/8 * duration`, marked inexact). ### 7.5 T4 reuse Request for `(track, codec, bitrate, bucket=0)` with a complete T4 object → plain file response (full Range support, `ETag`, immutable-style caching as in §3 but `private`). In-progress T4 objects support **tail-following reads**: a second listener attaches to the growing file rather than spawning a second ffmpeg (reader blocks at EOF until writer signals progress or completion). ## 8. Prefetch of the predicted next track When the recommendation engine (doc 07) emits a "next track" prediction with score ≥ its confidence floor and the current track passes 50% played: - Direct-play case with presign: nothing to do server-side (client receives the presigned URL in the queue payload just-in-time). - Proxy/transcode case: warm T3 chunks `0..3` (16 MiB) and, if the session's negotiated profile requires transcoding, start the `bucket=0` transcode **paused at 8 MiB of output** (ffmpeg stdout backpressure does the pausing naturally — we just stop reading). Cost cap: at most 1 speculative job per active session, evictable instantly if the user picks a different track. ## 9. Client-side caching contract (summary table) | Response | Cache-Control | ETag | Range | |----------|---------------|------|-------| | Artwork | `public, max-age=31536000, immutable` | content key | yes | | Direct stream (proxy) | `private, max-age=0, must-revalidate` | media_version | yes | | Direct stream (presigned 302) | n/a (S3 serves) | S3's | yes (S3) | | Live transcode | `no-store` | none | no (use `timeOffset`) | | Completed transcode (T4 hit) | `private, max-age=3600` | tc key | yes | | JSON API | `no-store` | — | — |