# 07 — Recommendation Engine: Auto-Play Next Design Status: **Normative design contract** for FablePool's recommendation engine. Audience: server team, ML-adjacent contributors, client teams (consumers of the API surface in §8). The headline feature: when auto-play is on and the queue runs out, FablePool picks the next song using the **last song and the user's last N played songs**, where **N is user-configurable**. This document specifies the whole pipeline: feature extraction, candidate generation, scoring, configuration, storage, and the API. --- ## 1. Design Principles 1. **Local-first, no external services.** Everything runs inside the FablePool server against the user's own library. No calls to third-party recommendation APIs. (Optional enrichment from MusicBrainz/ListenBrainz tags is a *later-milestone* plugin point, not part of the core path.) 2. **Cheap at request time.** Picking the next track must complete in **< 100 ms p95** on a 50k-track library on modest hardware (4-core, no GPU). All heavy work happens at scan/analysis time. 3. **Deterministic-ish but not boring.** Scoring is deterministic; final selection samples from the top-K with temperature, so two plays of the same album don't always lead to the identical next song. 4. **Explainable.** Every recommendation response can carry a human-readable `reasons[]` (used by clients' "Why this song?" UI and by debugging). 5. **Degrades gracefully.** If audio analysis hasn't run yet (fresh library), the engine falls back to metadata-only scoring; if even metadata is thin, it falls back to library-popularity + shuffle. There is never a state where auto-play silently does nothing while tracks exist. --- ## 2. Pipeline Overview ```mermaid flowchart LR subgraph Offline["Offline (scan / analysis workers)"] SC[Library scanner
doc 03] --> META[Metadata features
tags, genre, year, artist] SC --> AF[Audio analysis worker
decodes first 90s via ffmpeg] AF --> FV[(track_features table
doc 02 §8)] META --> FV FV --> ANN[ANN index build
per-library HNSW, in-process] end subgraph Online["Online (per request, <100ms)"] H[(play_history)] --> CTX[Context builder
last-N window, decay weights] CTX --> CAND[Candidate generation
ANN top-200 ∪ metadata buckets] CAND --> SCORE[Scorer
weighted linear blend] SCORE --> FILT[Filters
anti-repeat, dedupe, user rules] FILT --> PICK[Top-K temperature sampling] PICK --> OUT[next track + reasons] end ANN -.loaded at startup.-> CAND ``` --- ## 3. Feature Extraction (offline) Runs as a background job pool after the library scanner registers/updates a track (job framework per doc 01 §6). Two tiers: ### 3.1 Tier 1 — Metadata features (always available, instant) Extracted from tags during scan; no audio decode needed. | Feature | Source | Encoding | |---|---|---| | `artist_id`, `album_artist_id` | tags | categorical (ID match) | | `genres[]` | tags, split on `;`/`,`, normalized lowercase | multi-hot over library genre vocabulary | | `year` | tags | numeric, bucketed by half-decade for matching | | `duration_ms` | container | numeric | | `track_gain` (ReplayGain/R128 if present) | tags | numeric, proxy for loudness when Tier 2 absent | | `label`, `composer` | tags | categorical (optional) | ### 3.2 Tier 2 — Audio features (analysis worker) The analysis worker decodes a deterministic sample of the file — **the middle 90 seconds** (skip intro/outro bias), mono, 22.05 kHz — through the server's ffmpeg transcode path (reusing the byte-range reader from doc 03, so analysis of S3/WebDAV files streams only the needed bytes when the container allows; otherwise it reads sequentially up to the needed point). From the PCM it computes, with our own DSP code (Rust, `rustfft`-based — no Essentia dependency, to keep the build simple): | Feature | Dim | Method (summary) | |---|---|---| | `tempo_bpm` | 1 | onset-strength autocorrelation (spectral-flux onsets, comb scoring 60–180 BPM, octave correction) | | `energy` | 1 | mean RMS, dB-scaled, normalized 0–1 over library | | `dynamic_complexity` | 1 | stddev of frame RMS | | `spectral_centroid_mean` | 1 | brightness proxy | | `spectral_rolloff_mean` | 1 | 85 % rolloff | | `zero_crossing_rate` | 1 | percussiveness/noisiness proxy | | `mfcc_mean[1..13]`, `mfcc_var[1..13]` | 26 | 13 MFCCs over 2048-sample frames, hop 512; mean+variance pooling | | `chroma_mean[12]` | 12 | 12-bin chroma; coarse key/harmony proxy | | `key`, `mode` | 2 | Krumhansl–Schmuckler template correlation on pooled chroma | These are concatenated and **L2-normalized into a 44-dim "timbre vector"** (`tempo`, `energy`, complexity, centroid, rolloff, ZCR scaled to comparable ranges via library-wide robust z-scoring — median/IQR — then the MFCC and chroma blocks). Stored in `track_features` (doc 02 §8) as: ```sql track_features( track_id PK/FK, feature_version INT, -- bump invalidates & triggers re-analysis timbre_vec BYTEA, -- 44 × f32, little-endian tempo_bpm REAL, energy REAL, key_idx SMALLINT, -- 0-11, -1 unknown mode SMALLINT, -- 1 major, 0 minor, -1 unknown analyzed_at TIMESTAMPTZ ) ``` `feature_version` starts at 1; any change to the extraction algorithm bumps it and the analysis workers lazily re-process (oldest-version-first, rate-limited so re-analysis never starves transcoding workers — shared job-pool priority classes per doc 01 §6.3). ### 3.3 ANN index Per-library in-process **HNSW** index over the timbre vectors (M=16, efConstruction=200, efSearch=64), rebuilt incrementally on track add/remove and snapshotted to disk in the cache directory so server restart doesn't trigger a full rebuild. 50k tracks × 44 f32 ≈ 9 MB of vectors — RAM is a non-issue; this is why we choose in-process ANN over a vector-DB dependency. If the index is unavailable (corrupt snapshot, mid-rebuild), candidate generation falls back to metadata buckets only (§5) — the engine never blocks on the index. --- ## 4. Listening Context (the last-N window) ### 4.1 User configuration Per-user settings (doc 02 §7 `user_settings`, exposed in §8 API): | Setting | Default | Range | Meaning | |---|---|---|---| | `autoplay.enabled` | `true` | bool | master switch | | `autoplay.window_n` | `10` | 1–100 | **how many recent plays form the context** (the "X last played songs" requirement) | | `autoplay.last_song_weight` | `0.5` | 0–1 | share of context weight pinned to the *most recent* song; the remaining `1 − w` is spread over the rest of the window with exponential decay | | `autoplay.decay_half_life` | `5` | 1–50 | in plays; decay rate for the rest of the window | | `autoplay.exploration` | `0.3` | 0–1 | temperature for top-K sampling (§6.3) | | `autoplay.avoid_repeat_minutes` | `120` | 0–1440 | anti-repeat horizon (§6.2) | | `autoplay.same_artist_penalty` | `0.15` | 0–1 | discourage long same-artist runs | | `autoplay.scope` | `library` | `library`\|`queue_source` | recommend across the whole library, or stay inside the playlist/album the queue came from until exhausted | ### 4.2 Context vector Let plays `p₀` (most recent) … `p_{N−1}` be the last N **completed or ≥30 s** plays (skips shorter than 30 s are excluded from the positive context but recorded as **negative signals**, §6.2). Weights: ``` w₀ = last_song_weight wᵢ = (1 − last_song_weight) · dᵢ / Σⱼ dⱼ for i = 1..N−1 dᵢ = 0.5^(i / decay_half_life) ``` Context timbre vector: `c = Σ wᵢ · v(pᵢ)`, re-L2-normalized. Context metadata profile: weighted multiset of artists, genres, year-buckets with the same weights. Tracks missing Tier-2 vectors contribute only to the metadata profile; the timbre context renormalizes over available vectors. Edge cases: - **History shorter than N:** use what exists; if zero history, skip to fallback ladder (§7). - The window is computed per **user**, across devices/clients (history is server-side, doc 02 §6), so handing off from phone to Cast keeps context. --- ## 5. Candidate Generation Union of, capped at 500 candidates before scoring: 1. **ANN neighbors:** top-200 by cosine similarity to context vector `c` (skipped if no Tier-2 data yet). 2. **Metadata buckets:** up to 100 tracks sharing an artist with the window, up to 100 sharing a top-3 context genre, weighted-random within bucket by library play-count (popularity prior). 3. **Queue-source pool:** if `autoplay.scope = queue_source`, the unplayed remainder of the source playlist/album *replaces* (1)+(2) until it is exhausted, after which scope silently widens to `library` (clients show "Continuing with similar songs"). 4. **Exploration pool:** 25 uniformly random never-played tracks (gives the exploration term in §6.1 something to surface; keeps recommendations from collapsing onto the played subset). Excluded outright before scoring: tracks in the current queue, tracks the user has explicitly disliked (`track_reactions.reaction = 'dislike'`, doc 02 §9), and tracks whose backend is currently offline (storage health flag, doc 03 §8 — recommending an unplayable track is a bug, not a degradation). --- ## 6. Scoring ### 6.1 Score function For candidate `t`: ``` score(t) = w_sim · sim_timbre(c, v(t)) // cosine, 0 if no vector + w_meta · meta_affinity(t) // §6.1.1 + w_tempo · tempo_compat(t) // §6.1.2 + w_nov · novelty(t) // §6.1.3 + w_pop · popularity_prior(t) // log-scaled user+library plays, normalized − penalties(t) // §6.2 ``` Default weights (server config, hot-reloadable; per-user override is a later milestone): `w_sim=0.40, w_meta=0.25, w_tempo=0.10, w_nov=0.15, w_pop=0.10`. When the candidate has no timbre vector, `w_sim`'s mass is redistributed proportionally to `w_meta` and `w_tempo` (so metadata-only libraries still get sane rankings rather than uniformly losing 0.4). **6.1.1 `meta_affinity`:** weighted Jaccard-style overlap between the candidate's {artist, album-artist, genres, year-bucket} and the context metadata profile; artist match capped so a single dominant artist can't pin the score (cap = 0.6 of the meta term). **6.1.2 `tempo_compat`:** `exp(−(Δbpm_oct / 12)²)` where `Δbpm_oct` is the BPM distance after octave-folding (half/double-time equivalence), vs the context's weighted mean BPM. **6.1.3 `novelty`:** `1 / (1 + user_play_count(t))`, scaled by recency — a track unplayed for 90+ days counts as fully novel even if historically played a lot. ### 6.2 Penalties & hard filters | Rule | Type | Effect | |---|---|---| | Played within `avoid_repeat_minutes` | **hard filter** | excluded | | Same track as any of last 3 window entries | **hard filter** | excluded (even beyond the time horizon) | | Same artist as current track | soft | −`same_artist_penalty`, doubled if the last 2 plays were already that artist (breaks artist ruts) | | Same album as current track | soft | −0.05 (unless `scope=queue_source` on that album) | | Skip signal: user skipped this track <30 s within last 14 days | soft | −0.25 per recent skip, capped −0.5 | | Disliked artist (≥3 dislikes for the artist) | soft | −0.3 | ### 6.3 Selection Take top-K (K=10) by score; sample one with softmax temperature `τ = 0.05 + 0.45 · exploration` over the scores. `exploration = 0` ⇒ τ=0.05 ≈ argmax; `exploration = 1` ⇒ broad sampling. The full ranked top-K is returned when the client asks for a *radio batch* (§8.2) rather than a single next track. ### 6.4 Reasons Each returned track carries machine+human reasons, e.g.: ```json "reasons": [ {"code": "TIMBRE_SIMILAR", "detail": "Sounds similar to 'Song A'", "weight": 0.31}, {"code": "GENRE_MATCH", "detail": "Shares genre 'shoegaze' with 4 of your last 10 plays", "weight": 0.18}, {"code": "NOVELTY", "detail": "You haven't played this in 6 months", "weight": 0.12} ] ``` --- ## 7. Fallback Ladder Evaluated top-down; first applicable level wins: 1. **Full:** Tier-2 vectors for ≥ 30 % of library and ≥ 1 play in window → full pipeline. 2. **Metadata-only:** any play history → §5 buckets + §6 without `w_sim`. 3. **Popularity shuffle:** no usable history (new user) → weighted-random by library-wide play count, else uniform random; anti-repeat filters still apply. 4. **Empty library:** `next = null` (clients stop playback gracefully; Cast receiver gets `NEXT_ITEM_RESPONSE {item:null}`, doc 06). The active level is reported in responses as `"strategy"` so clients/tests can assert behavior. --- ## 8. API Surface Full schemas in `api/openapi.yaml`; summarized contract: ### 8.1 `POST /api/v1/recommendations/next` Request: ```json { "windowN": 10, "queueTrackIds": ["trk_a", "trk_b"], "queueSourceId": "pl_44d1", "overrides": { "exploration": 0.5 } } ``` `windowN` and `overrides` are optional per-request overrides of the user's stored settings (clients pass the user's live UI value; nothing is persisted by this call). Response: one track (full track object) + `score`, `strategy`, `reasons[]`, or `{"track": null, "strategy": "empty_library"}`. ### 8.2 `POST /api/v1/recommendations/radio` Same inputs + `"count": 1–50`; returns a ranked batch (used for "Start radio from this song" — the seed song is passed as a synthetic single-entry window via `"seedTrackId"`). ### 8.3 Settings `GET/PATCH /api/v1/me/settings/autoplay` — the §4.1 settings object, validated server-side against the stated ranges. ### 8.4 Feedback `POST /api/v1/tracks/{id}/reaction` `{"reaction": "like"|"dislike"|null}` — feeds §6.2. Skips are inferred from history events (a `POST /history` with `completed:false, positionMs<30000`), not a separate endpoint. ### 8.5 Subsonic compatibility Subsonic's `getSimilarSongs` / `getSimilarSongs2` (`id`, `count`) are mapped onto §8.2 with the given song as seed — so existing Subsonic clients (DSub, Symfonium, etc.) get FablePool recommendations for free. --- ## 9. Performance & Sizing | Stage | Budget (50k tracks, 4-core) | |---|---| | Context build (DB read of last N plays + vectors) | ≤ 10 ms (history indexed by `(user_id, played_at DESC)`) | | ANN top-200 | ≤ 5 ms (HNSW efSearch=64) | | Metadata buckets (indexed queries) | ≤ 20 ms | | Scoring ≤ 500 candidates | ≤ 5 ms (pure in-memory arithmetic) | | Total p95 | **< 100 ms** | | Tier-2 analysis throughput | ≥ 4 tracks/s/core target (90 s @ 22.05 kHz mono decode + FFTs); a 10k-track library analyzes overnight on a NAS | Analysis is strictly lower job priority than streaming/transcoding (doc 01 §6.3) and is pausable via admin API. --- ## 10. Evaluation & Testing (engineering acceptance, not research) - **Golden tests:** fixture library (~200 CC-licensed clips) with precomputed feature vectors checked into test data; scoring unit tests assert ranking invariants (e.g. same-genre+similar-tempo track outranks opposite-genre track given a clean context). - **Property tests:** anti-repeat hard filters can never be violated; output track is always playable (backend online, not disliked-hard-filtered). - **Determinism harness:** with `exploration=0` and a fixed history, the next track is deterministic → snapshot tests. - **Offline replay metric (CI, informational):** hit-rate@10 — replay real (donated, anonymized) listening sessions, check how often the actually- played next song appears in our top-10. Tracked over time, not gated. - **Latency gate:** integration test asserts p95 < 100 ms on the 50k synthetic library fixture.