# 03 — Storage Abstraction Layer
Status: **Normative**. This document defines the `StorageDriver` contract that all
backends (S3, WebDAV, and any future driver) must implement, plus the
backend-specific designs and the library scanner's tag-reading plan. Server code in
later milestones MUST conform to the interfaces and semantics here; deviations
require a documented amendment to this file.
Reference signatures are given in Go (the server language chosen in
`docs/01-architecture.md`), but the contract is language-agnostic: semantics, error
taxonomy, and pagination rules are what later milestones are held to.
---
## 1. Design goals
1. **Read-mostly, remote-first.** FablePool never owns the music files. The source
of truth is the user's S3 bucket or WebDAV share. The server only reads;
the single optional write path (playlist export as `.m3u8`) is explicitly
feature-flagged and off by default.
2. **Byte-range native.** Every driver must support partial reads. Streaming,
tag-reading, and seek-while-transcoding all depend on cheap range access.
3. **Pagination-safe listing.** Libraries with 500k+ objects must scan without
unbounded memory. Listing is cursor-based end to end.
4. **Capability discovery, not lowest common denominator.** Drivers advertise
capabilities (e.g. presigned URLs, server-side `If-None-Match`); callers branch
on capabilities rather than on driver identity.
5. **Deterministic change detection.** Every entry exposes a *version token*
(ETag or `Last-Modified`+size composite) so incremental scans are cheap.
---
## 2. Core types
```go
// Package storage defines the backend-agnostic contract.
package storage
import (
"context"
"io"
"time"
)
// EntryKind discriminates listing results.
type EntryKind int
const (
KindFile EntryKind = iota
KindDir // WebDAV collections; synthesized for S3 prefixes
)
// Entry is one object/collection returned by List or Stat.
type Entry struct {
// Path is the driver-relative path, always forward-slash separated,
// never starting with "/", e.g. "Albums/Kraftwerk/Autobahn/01 Autobahn.flac".
Path string
Kind EntryKind
// Size in bytes. -1 if unknown (some WebDAV servers omit
// getcontentlength on collections; never -1 for files we will stream).
Size int64
// Version is an opaque change-detection token.
// S3: the ETag (quotes stripped). WebDAV: getetag if present, else
// sha1(lastmodified + ":" + size) computed by the driver.
Version string
// ModTime is best-effort; zero value if the backend omits it.
ModTime time.Time
// ContentType as reported by the backend ("" if unknown). Drivers MUST NOT
// sniff content; the scanner decides format from extension + magic bytes.
ContentType string
}
// Capability flags advertised by a driver instance (may vary per endpoint:
// e.g. a MinIO endpoint without a public hostname cannot do useful presigning).
type Capability uint32
const (
CapRangeRead Capability = 1 << iota // mandatory; drivers without it are rejected at config time
CapPresign // can mint client-direct URLs
CapCondGet // honors If-None-Match / If-Modified-Since
CapWrite // playlist export only
)
// ListPage is one page of a listing.
type ListPage struct {
Entries []Entry
NextCursor string // "" means end of listing
}
// ListOptions controls traversal.
type ListOptions struct {
// Prefix scopes the listing ("" = root of the configured base path).
Prefix string
// Recursive: true = full subtree (S3 native; WebDAV via iterative BFS,
// see §4.2). false = single level.
Recursive bool
// PageSize is a hint; drivers may return fewer. Hard cap 1000.
PageSize int
// Cursor resumes a prior page ("" = start).
Cursor string
}
```
### 2.1 The driver interface
```go
type Driver interface {
// Capabilities is constant for the lifetime of the driver instance.
Capabilities() Capability
// Stat returns metadata for a single path.
// Errors: ErrNotFound, ErrAuth, ErrBackend.
Stat(ctx context.Context, path string) (Entry, error)
// List returns one page. Cursors are opaque, driver-defined, and MUST
// remain valid for at least 15 minutes of wall time (S3 continuation
// tokens satisfy this; the WebDAV driver persists its BFS frontier,
// see §4.2.3).
List(ctx context.Context, opt ListOptions) (ListPage, error)
// Open returns a reader for bytes [offset, offset+length).
// length == -1 means "to EOF". Implementations MUST translate this to a
// single HTTP range request — never a full GET that is then discarded.
// The returned ReadCloser must be closed by the caller; Close MUST drain
// or abort the underlying connection promptly (no lingering sockets).
// Errors: ErrNotFound, ErrAuth, ErrRangeUnsupported, ErrBackend.
Open(ctx context.Context, path string, offset, length int64) (io.ReadCloser, error)
// Presign mints a URL a client (browser, Chromecast receiver, Android
// app) can GET directly, valid for ttl. Only callable if CapPresign.
// The URL MUST support Range requests when fetched.
Presign(ctx context.Context, path string, ttl time.Duration) (string, error)
// VersionOf is a cheap version check: HEAD (S3) or Depth:0 PROPFIND
// (WebDAV). Used by incremental scans and cache validation.
VersionOf(ctx context.Context, path string) (string, error)
// Put writes a small object (playlist export). Only if CapWrite.
Put(ctx context.Context, path string, contentType string, body io.Reader, size int64) error
}
```
### 2.2 Error taxonomy
All driver errors wrap exactly one sentinel; callers branch with `errors.Is`.
| Sentinel | Meaning | Caller behavior |
|----------------------|-------------------------------------------|-----------------|
| `ErrNotFound` | 404 / S3 NoSuchKey | Mark track missing (soft-delete after 3 consecutive scans, see §5.4) |
| `ErrAuth` | 401/403 / S3 AccessDenied | Mark **library** errored; alert owner; stop scan |
| `ErrRangeUnsupported`| Backend ignored `Range` (returned 200) | WebDAV only; fall back per §4.3 |
| `ErrThrottled` | 429 / S3 SlowDown / 503 | Retry with backoff (§6) |
| `ErrBackend` | Anything else (5xx, network, malformed) | Retry (§6); after exhaustion mark item errored |
Drivers MUST map raw backend errors into this taxonomy; raw errors are preserved
in the wrap chain for logging only.
### 2.3 Path rules (normative)
- Driver-relative, `/`-separated, no leading `/`, no `.`/`..` segments. The
driver validates and rejects (`ErrBackend` with `invalid path`) — this is the
path-traversal defense line; see `docs/05-auth-and-security.md` §7.
- Unicode is passed through as UTF-8. S3 keys are byte strings — store the exact
key bytes from `ListObjectsV2` in the DB (`media_file.storage_path`,
`docs/02-data-model.md`), never a re-normalized form. WebDAV hrefs are
percent-decoded once on ingest and re-encoded (RFC 3986, per segment) on request.
- Each library row carries a `base_path`; the driver prepends it. Application
code above the driver never sees the base path.
---
## 3. S3 driver
Targets the S3 REST API as implemented by AWS S3, MinIO, Backblaze B2 (S3 API),
Wasabi, and Cloudflare R2. Implementation library: **AWS SDK for Go v2**
(`github.com/aws/aws-sdk-go-v2`), modules `config`, `credentials`, `service/s3`,
plus `feature/s3/manager` *not* used (no multipart downloads — we stream ranges
ourselves).
### 3.1 Configuration (per library)
| Field | Notes |
|-------------------|-------|
| `endpoint` | Optional; empty = AWS. Set for MinIO/R2/B2. |
| `region` | Required by SigV4 even for non-AWS (use `us-east-1` default). |
| `bucket` | Required. |
| `base_path` | Key prefix, may be `""`. |
| `access_key_id` / `secret_access_key` | Stored encrypted; see `docs/05-auth-and-security.md` §6. |
| `force_path_style`| Default **true** when `endpoint` is set (MinIO needs it), false for AWS. |
| `presign_enabled` | Default true; owner can disable (e.g. credentials are an assumed role the owner doesn't want minting public URLs). Drives `CapPresign`. |
### 3.2 Listing & pagination
- `List` maps to **`ListObjectsV2`**:
- `Prefix` = `base_path + opt.Prefix`.
- `Recursive=true` → no `Delimiter`; flat key stream. This is the scanner's mode.
- `Recursive=false` → `Delimiter="/"`; `CommonPrefixes` become `KindDir`
entries (used only by the library-setup browser UI).
- `MaxKeys` = `min(opt.PageSize, 1000)`.
- Cursor = the raw `NextContinuationToken` (opaque to callers, as required).
- Keys ending in `/` with size 0 (console-created "folders") are dropped.
- `Entry.Version` = ETag with surrounding quotes stripped. Note: multipart-uploaded
objects have non-MD5 ETags — we treat ETags as *opaque version tokens only*,
never as content hashes.
### 3.3 Range reads
`Open(path, offset, length)` → `GetObject` with:
```
Range: bytes={offset}- // length == -1
Range: bytes={offset}-{offset+length-1} // otherwise
```
- Expect `206 Partial Content`. A `200` from an S3-compatible endpoint that
ignored `Range` is a misconfiguration → `ErrRangeUnsupported` (and the
endpoint is flagged in the library health status).
- `416 Requested Range Not Satisfiable` → `ErrBackend` wrapping a typed
`RangeError{Offset, Size}`; the streaming layer translates this to HTTP 416.
- `Open(path, 0, -1)` is the canonical "full object" read; still issued **with**
`Range: bytes=0-` so the response is uniformly 206 (simplifies the reader).
### 3.4 Presigned URLs
- `s3.PresignClient.PresignGetObject` with `Expires = ttl`.
- TTL policy: streaming URLs **15 min**, artwork **24 h** (values from
`docs/04-caching-and-transcoding.md` §7). Never embed presigned URLs in
database rows or logs.
- Presigned GETs honor `Range` from the client because the signature does not
cover the `Range` header — this is what makes direct-to-client seek and
Chromecast scrubbing work (`docs/06-chromecast.md`).
- R2/B2/MinIO caveat: presigning requires the configured `endpoint` to be
reachable by *clients*, not just the server. The library health check (§7)
performs a server-side fetch of a freshly minted URL with `Range: bytes=0-0`
and disables `CapPresign` (with a UI warning) on failure.
### 3.5 Conditional GET
S3 honors `If-None-Match` on `GetObject`/`HeadObject` → driver advertises
`CapCondGet`. `VersionOf` = `HeadObject`, returning the ETag.
---
## 4. WebDAV driver
Targets RFC 4918 class-1 servers: Nextcloud/ownCloud, Apache mod_dav, nginx
dav module, rclone serve webdav, SFTPGo, Synology. Implemented over
`net/http` directly (no third-party DAV client) with a hand-rolled, namespace-aware
PROPFIND XML parser via `encoding/xml` — DAV servers are too inconsistent to trust
a generic client library's assumptions.
### 4.1 Configuration (per library)
| Field | Notes |
|--------------|-------|
| `base_url` | e.g. `https://cloud.example.com/remote.php/dav/files/alice/Music/`. HTTPS strongly recommended; plain HTTP requires an explicit `allow_insecure` flag. |
| `username` / `password` | Basic auth (over TLS) and Digest auth (RFC 7616) both supported; auth scheme auto-detected from the first `401` challenge and cached per driver instance. |
| `verify_tls` | Default true; `false` only with explicit owner opt-in (self-hosted Synology et al.). |
### 4.2 Traversal: PROPFIND
#### 4.2.1 Request shape
`Recursive=false` (single level) issues:
```
PROPFIND {base_url}{prefix} HTTP/1.1
Depth: 1
Content-Type: application/xml; charset=utf-8
```
We request a **named prop set** (never `allprop`) — Nextcloud's `allprop` responses
are large and slow on big folders.
#### 4.2.2 Response handling (normative quirks list)
- Accept `207 Multi-Status`; anything else → error taxonomy mapping.
- The response for the requested collection itself appears as one of the
`` elements — drop it (compare canonicalized hrefs).
- **Href canonicalization:** servers return hrefs that are absolute paths,
absolute URLs, or (Nextcloud) paths including the DAV root prefix.
Canonicalize: parse as URL-reference, take the path, percent-decode,
strip the base URL's path prefix. If the href doesn't start with the base
prefix, the entry is dropped and a scan warning is recorded.
- `resourcetype` containing `` → `KindDir` (trailing slash on
href is *not* trusted as the signal — some servers omit it).
- Per-prop `` of 404 inside a propstat block: treat that prop as
absent, not an error.
- `getlastmodified` is RFC 1123; tolerate RFC 850 and asctime
(`http.ParseTime`). `getetag` may be weak (`W/"..."`) — strip the weak
prefix; it's an opaque version token for us.
- `Entry.Version` = etag if present, else `sha1(lastmodifiedUnix + ":" + size)`.
`Depth: infinity` is **never** used: Nextcloud disables it by default, Apache
caps it, and unbounded responses can't be paginated.
#### 4.2.3 Recursive listing = iterative BFS with persistent frontier
WebDAV has no native pagination, so the driver synthesizes cursor semantics:
1. Maintain a FIFO frontier of collection paths, seeded with `opt.Prefix`.
2. Per `List` call: pop collections and issue `Depth: 1` PROPFINDs (up to 4
concurrently) until ≥ `PageSize` file entries are gathered or the frontier
is empty. Discovered sub-collections are pushed onto the frontier.
3. Cursor = a random 128-bit token keyed into a **server-side cursor store**
(table `scan_cursor`: token, library_id, JSON frontier + carry-over entries,
`expires_at = now()+30min`). This satisfies the ≥15-min cursor validity rule
without shipping a potentially huge frontier to the caller. Expired cursors
return `ErrBackend("cursor expired")`; the scanner restarts that library scan.
4. Cycle guard: a visited-set of canonical paths capped at 1M entries; symlinked
loops (seen on mod_dav over symlinked trees) terminate with a scan warning.
5. Depth cap: 64 levels (configurable), beyond which subtrees are skipped with
a warning.
### 4.3 Partial GET
`Open(path, offset, length)` → `GET` with the same `Range` header forms as §3.3.
- `206` → wrap body. Verify `Content-Range` start equals `offset`; mismatch →
`ErrBackend`.
- `200` (server ignored Range): per-path fallback decision:
- If `offset == 0`: use the 200 body, wrap it in a `LimitReader(length)`,
and **record `ranges=false` for this library** (sticky until next full scan).
- If `offset > 0`: return `ErrRangeUnsupported`. Callers then choose:
- **Scanner** (tag reads, §5): fetch from 0 through the needed window via
a discard-prefix reader, but only if the needed end offset ≤ 4 MiB;
otherwise download the whole file into the audio cache and read locally.
- **Streamer**: pull the full file through the audio cache
(`docs/04-caching-and-transcoding.md` §4) and serve ranges from disk.
Libraries with `ranges=false` get a persistent UI warning since first-play
latency degrades.
- Probe at library-creation time: `GET` the first listed file with
`Range: bytes=0-0`; result seeds the `ranges` flag.
- `HEAD` support is also probed; `VersionOf` uses `HEAD`, falling back to
`Depth: 0` PROPFIND if the server rejects HEAD (some nginx dav configs do).
### 4.4 Presigning
Plain WebDAV has no presign mechanism → `CapPresign` is **not** advertised.
Clients stream WebDAV-backed tracks through the server's `/api/v1/stream`
endpoint (which itself supports Range; see `docs/04-caching-and-transcoding.md`
§6). The capability split is exactly why callers must branch on `CapPresign`,
not on backend type. (A future enhancement — Nextcloud share-link minting via
OCS API — is out of scope and noted in `docs/09-milestone-map.md`.)
---
## 5. Library scanner & tag-reading plan
The scanner turns a `Driver` listing into `media_file` / `album` / `artist` rows
(`docs/02-data-model.md`). It is range-read-driven: for typical libraries it
reads **< 1%** of audio bytes.
### 5.1 Pipeline
```mermaid
flowchart LR
L[Lister
Driver.List pages] --> D{Diff vs DB
by path+Version}
D -->|unchanged| SKIP[skip]
D -->|new / changed| Q[(tag-read queue)]
Q --> W1[Tag worker 1]
Q --> W2[Tag worker ...k]
W1 --> M[Metadata normalizer]
W2 --> M
M --> U[(DB upsert
+ search index)]
U --> A[Artwork resolver]
```
- Lister runs single-threaded per library (pagination order preserved);
tag workers default to 4 per library, global cap 16 per server
(configurable; WebDAV libraries default to 2 to be polite to Nextcloud).
- Audio file selection: extension allowlist
`mp3 flac ogg oga opus m4a m4b aac wav aif aiff wma ape wv dsf` plus
`m3u m3u8 pls` (playlist import) and `jpg jpeg png webp gif` (artwork
candidates, path-recorded only — bytes fetched lazily).
### 5.2 Tag reading via byte ranges (normative byte budgets)
Per format, the scanner fetches the *minimum* windows needed. Parser library:
**`github.com/dhowden/tag`** for frame decoding, fed by a `seekable remote
reader` adapter that maps `Seek`+`Read` onto `Driver.Open` range calls with a
64 KiB read-ahead buffer and a per-file budget. If a file exceeds its budget,
it is parsed from a full cached download instead (counted in scan stats).
| Format | Read plan | Typical bytes |
|--------|-----------|---------------|
| **MP3 / ID3v2** | Bytes `0–9` → ID3v2 header → tag size (syncsafe int). Fetch `0–(10+size)`. Cap: if size > 2 MiB (huge embedded art), fetch `0–256 KiB` for text frames and record `APIC` offset for lazy artwork fetch. Then last `128 B` for ID3v1 fallback. Duration: parse first MPEG frame header after the tag (+ Xing/VBRI header if present) → bitrate/VBR table; for headerless VBR, estimate from size and flag `duration_estimated`. | 8–80 KiB |
| **FLAC** | Bytes `0–65535`; walk METADATA_BLOCKs (STREAMINFO gives exact duration + sample rate + MD5; VORBIS_COMMENT gives tags). If blocks extend past 64 KiB (big PICTURE first), fetch continuation ranges block-by-block, skipping PICTURE bodies (record offset+length for lazy fetch). | 16–64 KiB |
| **Ogg Vorbis / Opus** | First `64 KiB` (ident + comment headers). Duration needs the **last** granule position: fetch final `64 KiB`, scan backwards for last `OggS` capture pattern, read granulepos; divide by sample rate (Opus: 48 kHz fixed, minus pre-skip). | ~128 KiB |
| **M4A/MP4 (AAC/ALAC)** | Bytes `0–16` → first atom. If `moov` precedes `mdat` (most taggers): walk atoms with targeted ranges (`moov` is usually ≤ 1 MiB). If `mdat` first ("non-faststart"): read trailing `1 MiB` and locate `moov` from the end; if not found, full download. `mvhd` → duration; `ilst` → tags; `covr` offset recorded for lazy artwork. | 64 KiB–1 MiB |
| **WAV/AIFF** | RIFF/FORM chunk walk from byte 0; `fmt `+`LIST INFO`/`id3 ` chunk ranges only. Duration from `data` chunk size ÷ byte rate. | ≤ 64 KiB |
| **WavPack/APE/DSF/WMA** | First `256 KiB` + last `64 KiB` (APEv2 tags live at EOF). Anything unparsed → full download path. | ≤ 320 KiB |
Hard per-file range-read budget: **4 MiB**; over budget → full-file path via the
audio cache (the downloaded copy is *retained* in cache, so the first play is
then free).
### 5.3 Normalization rules
- Tag precedence: format-native (Vorbis comment / ilst / ID3v2.4 > v2.3 > v1).
- Multi-value artists: split on `\x00` (ID3v2.4), `;`, ` / ` (configurable
per library; default `\x00` and `;` only — `/` breaks "AC/DC").
- `albumartist` absent → first track artist; `va`/"Various Artists" detection
when > 60% of an album's tracks disagree on artist.
- Album grouping key: `MUSICBRAINZ_ALBUMID` if present, else
`lower(albumartist) + "‖" + lower(album) + "‖" + parentDir`.
- ReplayGain / R128 tags captured into `media_file.rg_track_gain` etc. —
consumed by clients and by the recommendation feature extractor
(`docs/07-recommendation-engine.md` §4).
- Embedded artwork: never stored in the DB; `(path, offset, length, mime)`
recorded in `media_file.artwork_ref`, fetched lazily into the artwork cache.
### 5.4 Incremental & repair scans
- **Incremental scan** (default, scheduled + on-demand): full listing pass,
diff `(path, Version)` against DB. Unchanged → touch `last_seen_at` only.
New/changed → tag-read. Missing from listing → increment `missing_count`;
at 3 consecutive misses, soft-delete (`deleted_at` set; play history and
playlist references preserved per `docs/02-data-model.md` §6).
- **Deep scan** (manual): ignores Version tokens, re-reads all tags.
- Listing is resumable: scanner checkpoints `(library_id, cursor, page_no)`
every page, so a server restart resumes mid-scan (S3 tokens and the WebDAV
cursor store both honor the 15-min validity floor; older checkpoints restart
the scan from the top, which is safe because the diff is idempotent).
---
## 6. Retries, timeouts, concurrency (applies to all drivers)
| Operation | Timeout | Retries |
|------------------|---------|---------|
| `Stat`/`VersionOf` | 10 s | 3, exp backoff 250 ms base, jitter, cap 5 s |
| `List` page | 30 s | 3 (same schedule) |
| `Open` connect/first-byte | 15 s | 3 — but **only before any body byte is delivered**; mid-stream failures surface to the caller, which re-`Open`s at `offset + bytesRead` (the streaming layer does this transparently up to 2 times per request) |
| `Presign` | local op | n/a |
- Retry only on `ErrThrottled` and `ErrBackend`-network; never on `ErrAuth`/`ErrNotFound`.
- Per-library concurrency limiter (semaphore): default 8 concurrent backend
requests for S3, 4 for WebDAV. Streaming holds a slot for connection setup
only, not for the duration of the stream.
- All drivers share an instrumented `http.Client` (connection pooling,
per-request metrics: backend latency, bytes, status — exported per
`docs/01-architecture.md` §observability).
## 7. Library health checks
On creation and every 6 h, per library: `List` one page; `Open` first file with
`bytes=0-0`; if `CapPresign`, mint + fetch a presigned URL (`bytes=0-0`).
Results land in `library.health_status` (`ok | degraded | error`) with a
human-readable detail string surfaced in the admin UI and the
`GET /api/v1/libraries/{id}` response.