# 03 — Storage Abstraction Layer Status: **Normative**. This document defines the `StorageDriver` contract that all backends (S3, WebDAV, and any future driver) must implement, plus the backend-specific designs and the library scanner's tag-reading plan. Server code in later milestones MUST conform to the interfaces and semantics here; deviations require a documented amendment to this file. Reference signatures are given in Go (the server language chosen in `docs/01-architecture.md`), but the contract is language-agnostic: semantics, error taxonomy, and pagination rules are what later milestones are held to. --- ## 1. Design goals 1. **Read-mostly, remote-first.** FablePool never owns the music files. The source of truth is the user's S3 bucket or WebDAV share. The server only reads; the single optional write path (playlist export as `.m3u8`) is explicitly feature-flagged and off by default. 2. **Byte-range native.** Every driver must support partial reads. Streaming, tag-reading, and seek-while-transcoding all depend on cheap range access. 3. **Pagination-safe listing.** Libraries with 500k+ objects must scan without unbounded memory. Listing is cursor-based end to end. 4. **Capability discovery, not lowest common denominator.** Drivers advertise capabilities (e.g. presigned URLs, server-side `If-None-Match`); callers branch on capabilities rather than on driver identity. 5. **Deterministic change detection.** Every entry exposes a *version token* (ETag or `Last-Modified`+size composite) so incremental scans are cheap. --- ## 2. Core types ```go // Package storage defines the backend-agnostic contract. package storage import ( "context" "io" "time" ) // EntryKind discriminates listing results. type EntryKind int const ( KindFile EntryKind = iota KindDir // WebDAV collections; synthesized for S3 prefixes ) // Entry is one object/collection returned by List or Stat. type Entry struct { // Path is the driver-relative path, always forward-slash separated, // never starting with "/", e.g. "Albums/Kraftwerk/Autobahn/01 Autobahn.flac". Path string Kind EntryKind // Size in bytes. -1 if unknown (some WebDAV servers omit // getcontentlength on collections; never -1 for files we will stream). Size int64 // Version is an opaque change-detection token. // S3: the ETag (quotes stripped). WebDAV: getetag if present, else // sha1(lastmodified + ":" + size) computed by the driver. Version string // ModTime is best-effort; zero value if the backend omits it. ModTime time.Time // ContentType as reported by the backend ("" if unknown). Drivers MUST NOT // sniff content; the scanner decides format from extension + magic bytes. ContentType string } // Capability flags advertised by a driver instance (may vary per endpoint: // e.g. a MinIO endpoint without a public hostname cannot do useful presigning). type Capability uint32 const ( CapRangeRead Capability = 1 << iota // mandatory; drivers without it are rejected at config time CapPresign // can mint client-direct URLs CapCondGet // honors If-None-Match / If-Modified-Since CapWrite // playlist export only ) // ListPage is one page of a listing. type ListPage struct { Entries []Entry NextCursor string // "" means end of listing } // ListOptions controls traversal. type ListOptions struct { // Prefix scopes the listing ("" = root of the configured base path). Prefix string // Recursive: true = full subtree (S3 native; WebDAV via iterative BFS, // see §4.2). false = single level. Recursive bool // PageSize is a hint; drivers may return fewer. Hard cap 1000. PageSize int // Cursor resumes a prior page ("" = start). Cursor string } ``` ### 2.1 The driver interface ```go type Driver interface { // Capabilities is constant for the lifetime of the driver instance. Capabilities() Capability // Stat returns metadata for a single path. // Errors: ErrNotFound, ErrAuth, ErrBackend. Stat(ctx context.Context, path string) (Entry, error) // List returns one page. Cursors are opaque, driver-defined, and MUST // remain valid for at least 15 minutes of wall time (S3 continuation // tokens satisfy this; the WebDAV driver persists its BFS frontier, // see §4.2.3). List(ctx context.Context, opt ListOptions) (ListPage, error) // Open returns a reader for bytes [offset, offset+length). // length == -1 means "to EOF". Implementations MUST translate this to a // single HTTP range request — never a full GET that is then discarded. // The returned ReadCloser must be closed by the caller; Close MUST drain // or abort the underlying connection promptly (no lingering sockets). // Errors: ErrNotFound, ErrAuth, ErrRangeUnsupported, ErrBackend. Open(ctx context.Context, path string, offset, length int64) (io.ReadCloser, error) // Presign mints a URL a client (browser, Chromecast receiver, Android // app) can GET directly, valid for ttl. Only callable if CapPresign. // The URL MUST support Range requests when fetched. Presign(ctx context.Context, path string, ttl time.Duration) (string, error) // VersionOf is a cheap version check: HEAD (S3) or Depth:0 PROPFIND // (WebDAV). Used by incremental scans and cache validation. VersionOf(ctx context.Context, path string) (string, error) // Put writes a small object (playlist export). Only if CapWrite. Put(ctx context.Context, path string, contentType string, body io.Reader, size int64) error } ``` ### 2.2 Error taxonomy All driver errors wrap exactly one sentinel; callers branch with `errors.Is`. | Sentinel | Meaning | Caller behavior | |----------------------|-------------------------------------------|-----------------| | `ErrNotFound` | 404 / S3 NoSuchKey | Mark track missing (soft-delete after 3 consecutive scans, see §5.4) | | `ErrAuth` | 401/403 / S3 AccessDenied | Mark **library** errored; alert owner; stop scan | | `ErrRangeUnsupported`| Backend ignored `Range` (returned 200) | WebDAV only; fall back per §4.3 | | `ErrThrottled` | 429 / S3 SlowDown / 503 | Retry with backoff (§6) | | `ErrBackend` | Anything else (5xx, network, malformed) | Retry (§6); after exhaustion mark item errored | Drivers MUST map raw backend errors into this taxonomy; raw errors are preserved in the wrap chain for logging only. ### 2.3 Path rules (normative) - Driver-relative, `/`-separated, no leading `/`, no `.`/`..` segments. The driver validates and rejects (`ErrBackend` with `invalid path`) — this is the path-traversal defense line; see `docs/05-auth-and-security.md` §7. - Unicode is passed through as UTF-8. S3 keys are byte strings — store the exact key bytes from `ListObjectsV2` in the DB (`media_file.storage_path`, `docs/02-data-model.md`), never a re-normalized form. WebDAV hrefs are percent-decoded once on ingest and re-encoded (RFC 3986, per segment) on request. - Each library row carries a `base_path`; the driver prepends it. Application code above the driver never sees the base path. --- ## 3. S3 driver Targets the S3 REST API as implemented by AWS S3, MinIO, Backblaze B2 (S3 API), Wasabi, and Cloudflare R2. Implementation library: **AWS SDK for Go v2** (`github.com/aws/aws-sdk-go-v2`), modules `config`, `credentials`, `service/s3`, plus `feature/s3/manager` *not* used (no multipart downloads — we stream ranges ourselves). ### 3.1 Configuration (per library) | Field | Notes | |-------------------|-------| | `endpoint` | Optional; empty = AWS. Set for MinIO/R2/B2. | | `region` | Required by SigV4 even for non-AWS (use `us-east-1` default). | | `bucket` | Required. | | `base_path` | Key prefix, may be `""`. | | `access_key_id` / `secret_access_key` | Stored encrypted; see `docs/05-auth-and-security.md` §6. | | `force_path_style`| Default **true** when `endpoint` is set (MinIO needs it), false for AWS. | | `presign_enabled` | Default true; owner can disable (e.g. credentials are an assumed role the owner doesn't want minting public URLs). Drives `CapPresign`. | ### 3.2 Listing & pagination - `List` maps to **`ListObjectsV2`**: - `Prefix` = `base_path + opt.Prefix`. - `Recursive=true` → no `Delimiter`; flat key stream. This is the scanner's mode. - `Recursive=false` → `Delimiter="/"`; `CommonPrefixes` become `KindDir` entries (used only by the library-setup browser UI). - `MaxKeys` = `min(opt.PageSize, 1000)`. - Cursor = the raw `NextContinuationToken` (opaque to callers, as required). - Keys ending in `/` with size 0 (console-created "folders") are dropped. - `Entry.Version` = ETag with surrounding quotes stripped. Note: multipart-uploaded objects have non-MD5 ETags — we treat ETags as *opaque version tokens only*, never as content hashes. ### 3.3 Range reads `Open(path, offset, length)` → `GetObject` with: ``` Range: bytes={offset}- // length == -1 Range: bytes={offset}-{offset+length-1} // otherwise ``` - Expect `206 Partial Content`. A `200` from an S3-compatible endpoint that ignored `Range` is a misconfiguration → `ErrRangeUnsupported` (and the endpoint is flagged in the library health status). - `416 Requested Range Not Satisfiable` → `ErrBackend` wrapping a typed `RangeError{Offset, Size}`; the streaming layer translates this to HTTP 416. - `Open(path, 0, -1)` is the canonical "full object" read; still issued **with** `Range: bytes=0-` so the response is uniformly 206 (simplifies the reader). ### 3.4 Presigned URLs - `s3.PresignClient.PresignGetObject` with `Expires = ttl`. - TTL policy: streaming URLs **15 min**, artwork **24 h** (values from `docs/04-caching-and-transcoding.md` §7). Never embed presigned URLs in database rows or logs. - Presigned GETs honor `Range` from the client because the signature does not cover the `Range` header — this is what makes direct-to-client seek and Chromecast scrubbing work (`docs/06-chromecast.md`). - R2/B2/MinIO caveat: presigning requires the configured `endpoint` to be reachable by *clients*, not just the server. The library health check (§7) performs a server-side fetch of a freshly minted URL with `Range: bytes=0-0` and disables `CapPresign` (with a UI warning) on failure. ### 3.5 Conditional GET S3 honors `If-None-Match` on `GetObject`/`HeadObject` → driver advertises `CapCondGet`. `VersionOf` = `HeadObject`, returning the ETag. --- ## 4. WebDAV driver Targets RFC 4918 class-1 servers: Nextcloud/ownCloud, Apache mod_dav, nginx dav module, rclone serve webdav, SFTPGo, Synology. Implemented over `net/http` directly (no third-party DAV client) with a hand-rolled, namespace-aware PROPFIND XML parser via `encoding/xml` — DAV servers are too inconsistent to trust a generic client library's assumptions. ### 4.1 Configuration (per library) | Field | Notes | |--------------|-------| | `base_url` | e.g. `https://cloud.example.com/remote.php/dav/files/alice/Music/`. HTTPS strongly recommended; plain HTTP requires an explicit `allow_insecure` flag. | | `username` / `password` | Basic auth (over TLS) and Digest auth (RFC 7616) both supported; auth scheme auto-detected from the first `401` challenge and cached per driver instance. | | `verify_tls` | Default true; `false` only with explicit owner opt-in (self-hosted Synology et al.). | ### 4.2 Traversal: PROPFIND #### 4.2.1 Request shape `Recursive=false` (single level) issues: ``` PROPFIND {base_url}{prefix} HTTP/1.1 Depth: 1 Content-Type: application/xml; charset=utf-8 ``` We request a **named prop set** (never `allprop`) — Nextcloud's `allprop` responses are large and slow on big folders. #### 4.2.2 Response handling (normative quirks list) - Accept `207 Multi-Status`; anything else → error taxonomy mapping. - The response for the requested collection itself appears as one of the `` elements — drop it (compare canonicalized hrefs). - **Href canonicalization:** servers return hrefs that are absolute paths, absolute URLs, or (Nextcloud) paths including the DAV root prefix. Canonicalize: parse as URL-reference, take the path, percent-decode, strip the base URL's path prefix. If the href doesn't start with the base prefix, the entry is dropped and a scan warning is recorded. - `resourcetype` containing `` → `KindDir` (trailing slash on href is *not* trusted as the signal — some servers omit it). - Per-prop `` of 404 inside a propstat block: treat that prop as absent, not an error. - `getlastmodified` is RFC 1123; tolerate RFC 850 and asctime (`http.ParseTime`). `getetag` may be weak (`W/"..."`) — strip the weak prefix; it's an opaque version token for us. - `Entry.Version` = etag if present, else `sha1(lastmodifiedUnix + ":" + size)`. `Depth: infinity` is **never** used: Nextcloud disables it by default, Apache caps it, and unbounded responses can't be paginated. #### 4.2.3 Recursive listing = iterative BFS with persistent frontier WebDAV has no native pagination, so the driver synthesizes cursor semantics: 1. Maintain a FIFO frontier of collection paths, seeded with `opt.Prefix`. 2. Per `List` call: pop collections and issue `Depth: 1` PROPFINDs (up to 4 concurrently) until ≥ `PageSize` file entries are gathered or the frontier is empty. Discovered sub-collections are pushed onto the frontier. 3. Cursor = a random 128-bit token keyed into a **server-side cursor store** (table `scan_cursor`: token, library_id, JSON frontier + carry-over entries, `expires_at = now()+30min`). This satisfies the ≥15-min cursor validity rule without shipping a potentially huge frontier to the caller. Expired cursors return `ErrBackend("cursor expired")`; the scanner restarts that library scan. 4. Cycle guard: a visited-set of canonical paths capped at 1M entries; symlinked loops (seen on mod_dav over symlinked trees) terminate with a scan warning. 5. Depth cap: 64 levels (configurable), beyond which subtrees are skipped with a warning. ### 4.3 Partial GET `Open(path, offset, length)` → `GET` with the same `Range` header forms as §3.3. - `206` → wrap body. Verify `Content-Range` start equals `offset`; mismatch → `ErrBackend`. - `200` (server ignored Range): per-path fallback decision: - If `offset == 0`: use the 200 body, wrap it in a `LimitReader(length)`, and **record `ranges=false` for this library** (sticky until next full scan). - If `offset > 0`: return `ErrRangeUnsupported`. Callers then choose: - **Scanner** (tag reads, §5): fetch from 0 through the needed window via a discard-prefix reader, but only if the needed end offset ≤ 4 MiB; otherwise download the whole file into the audio cache and read locally. - **Streamer**: pull the full file through the audio cache (`docs/04-caching-and-transcoding.md` §4) and serve ranges from disk. Libraries with `ranges=false` get a persistent UI warning since first-play latency degrades. - Probe at library-creation time: `GET` the first listed file with `Range: bytes=0-0`; result seeds the `ranges` flag. - `HEAD` support is also probed; `VersionOf` uses `HEAD`, falling back to `Depth: 0` PROPFIND if the server rejects HEAD (some nginx dav configs do). ### 4.4 Presigning Plain WebDAV has no presign mechanism → `CapPresign` is **not** advertised. Clients stream WebDAV-backed tracks through the server's `/api/v1/stream` endpoint (which itself supports Range; see `docs/04-caching-and-transcoding.md` §6). The capability split is exactly why callers must branch on `CapPresign`, not on backend type. (A future enhancement — Nextcloud share-link minting via OCS API — is out of scope and noted in `docs/09-milestone-map.md`.) --- ## 5. Library scanner & tag-reading plan The scanner turns a `Driver` listing into `media_file` / `album` / `artist` rows (`docs/02-data-model.md`). It is range-read-driven: for typical libraries it reads **< 1%** of audio bytes. ### 5.1 Pipeline ```mermaid flowchart LR L[Lister
Driver.List pages] --> D{Diff vs DB
by path+Version} D -->|unchanged| SKIP[skip] D -->|new / changed| Q[(tag-read queue)] Q --> W1[Tag worker 1] Q --> W2[Tag worker ...k] W1 --> M[Metadata normalizer] W2 --> M M --> U[(DB upsert
+ search index)] U --> A[Artwork resolver] ``` - Lister runs single-threaded per library (pagination order preserved); tag workers default to 4 per library, global cap 16 per server (configurable; WebDAV libraries default to 2 to be polite to Nextcloud). - Audio file selection: extension allowlist `mp3 flac ogg oga opus m4a m4b aac wav aif aiff wma ape wv dsf` plus `m3u m3u8 pls` (playlist import) and `jpg jpeg png webp gif` (artwork candidates, path-recorded only — bytes fetched lazily). ### 5.2 Tag reading via byte ranges (normative byte budgets) Per format, the scanner fetches the *minimum* windows needed. Parser library: **`github.com/dhowden/tag`** for frame decoding, fed by a `seekable remote reader` adapter that maps `Seek`+`Read` onto `Driver.Open` range calls with a 64 KiB read-ahead buffer and a per-file budget. If a file exceeds its budget, it is parsed from a full cached download instead (counted in scan stats). | Format | Read plan | Typical bytes | |--------|-----------|---------------| | **MP3 / ID3v2** | Bytes `0–9` → ID3v2 header → tag size (syncsafe int). Fetch `0–(10+size)`. Cap: if size > 2 MiB (huge embedded art), fetch `0–256 KiB` for text frames and record `APIC` offset for lazy artwork fetch. Then last `128 B` for ID3v1 fallback. Duration: parse first MPEG frame header after the tag (+ Xing/VBRI header if present) → bitrate/VBR table; for headerless VBR, estimate from size and flag `duration_estimated`. | 8–80 KiB | | **FLAC** | Bytes `0–65535`; walk METADATA_BLOCKs (STREAMINFO gives exact duration + sample rate + MD5; VORBIS_COMMENT gives tags). If blocks extend past 64 KiB (big PICTURE first), fetch continuation ranges block-by-block, skipping PICTURE bodies (record offset+length for lazy fetch). | 16–64 KiB | | **Ogg Vorbis / Opus** | First `64 KiB` (ident + comment headers). Duration needs the **last** granule position: fetch final `64 KiB`, scan backwards for last `OggS` capture pattern, read granulepos; divide by sample rate (Opus: 48 kHz fixed, minus pre-skip). | ~128 KiB | | **M4A/MP4 (AAC/ALAC)** | Bytes `0–16` → first atom. If `moov` precedes `mdat` (most taggers): walk atoms with targeted ranges (`moov` is usually ≤ 1 MiB). If `mdat` first ("non-faststart"): read trailing `1 MiB` and locate `moov` from the end; if not found, full download. `mvhd` → duration; `ilst` → tags; `covr` offset recorded for lazy artwork. | 64 KiB–1 MiB | | **WAV/AIFF** | RIFF/FORM chunk walk from byte 0; `fmt `+`LIST INFO`/`id3 ` chunk ranges only. Duration from `data` chunk size ÷ byte rate. | ≤ 64 KiB | | **WavPack/APE/DSF/WMA** | First `256 KiB` + last `64 KiB` (APEv2 tags live at EOF). Anything unparsed → full download path. | ≤ 320 KiB | Hard per-file range-read budget: **4 MiB**; over budget → full-file path via the audio cache (the downloaded copy is *retained* in cache, so the first play is then free). ### 5.3 Normalization rules - Tag precedence: format-native (Vorbis comment / ilst / ID3v2.4 > v2.3 > v1). - Multi-value artists: split on `\x00` (ID3v2.4), `;`, ` / ` (configurable per library; default `\x00` and `;` only — `/` breaks "AC/DC"). - `albumartist` absent → first track artist; `va`/"Various Artists" detection when > 60% of an album's tracks disagree on artist. - Album grouping key: `MUSICBRAINZ_ALBUMID` if present, else `lower(albumartist) + "‖" + lower(album) + "‖" + parentDir`. - ReplayGain / R128 tags captured into `media_file.rg_track_gain` etc. — consumed by clients and by the recommendation feature extractor (`docs/07-recommendation-engine.md` §4). - Embedded artwork: never stored in the DB; `(path, offset, length, mime)` recorded in `media_file.artwork_ref`, fetched lazily into the artwork cache. ### 5.4 Incremental & repair scans - **Incremental scan** (default, scheduled + on-demand): full listing pass, diff `(path, Version)` against DB. Unchanged → touch `last_seen_at` only. New/changed → tag-read. Missing from listing → increment `missing_count`; at 3 consecutive misses, soft-delete (`deleted_at` set; play history and playlist references preserved per `docs/02-data-model.md` §6). - **Deep scan** (manual): ignores Version tokens, re-reads all tags. - Listing is resumable: scanner checkpoints `(library_id, cursor, page_no)` every page, so a server restart resumes mid-scan (S3 tokens and the WebDAV cursor store both honor the 15-min validity floor; older checkpoints restart the scan from the top, which is safe because the diff is idempotent). --- ## 6. Retries, timeouts, concurrency (applies to all drivers) | Operation | Timeout | Retries | |------------------|---------|---------| | `Stat`/`VersionOf` | 10 s | 3, exp backoff 250 ms base, jitter, cap 5 s | | `List` page | 30 s | 3 (same schedule) | | `Open` connect/first-byte | 15 s | 3 — but **only before any body byte is delivered**; mid-stream failures surface to the caller, which re-`Open`s at `offset + bytesRead` (the streaming layer does this transparently up to 2 times per request) | | `Presign` | local op | n/a | - Retry only on `ErrThrottled` and `ErrBackend`-network; never on `ErrAuth`/`ErrNotFound`. - Per-library concurrency limiter (semaphore): default 8 concurrent backend requests for S3, 4 for WebDAV. Streaming holds a slot for connection setup only, not for the duration of the stream. - All drivers share an instrumented `http.Client` (connection pooling, per-request metrics: backend latency, bytes, status — exported per `docs/01-architecture.md` §observability). ## 7. Library health checks On creation and every 6 h, per library: `List` one page; `Open` first file with `bytes=0-0`; if `CapPresign`, mint + fetch a presigned URL (`bytes=0-0`). Results land in `library.health_status` (`ok | degraded | error`) with a human-readable detail string surfaced in the admin UI and the `GET /api/v1/libraries/{id}` response.