# 07 — Recommendation Engine: Auto-Play Next Design
Status: **Normative design contract** for FablePool's recommendation engine.
Audience: server team, ML-adjacent contributors, client teams (consumers of
the API surface in §8).
The headline feature: when auto-play is on and the queue runs out, FablePool
picks the next song using the **last song and the user's last N played songs**,
where **N is user-configurable**. This document specifies the whole pipeline:
feature extraction, candidate generation, scoring, configuration,
storage, and the API.
---
## 1. Design Principles
1. **Local-first, no external services.** Everything runs inside the FablePool
server against the user's own library. No calls to third-party
recommendation APIs. (Optional enrichment from MusicBrainz/ListenBrainz tags
is a *later-milestone* plugin point, not part of the core path.)
2. **Cheap at request time.** Picking the next track must complete in
**< 100 ms p95** on a 50k-track library on modest hardware (4-core,
no GPU). All heavy work happens at scan/analysis time.
3. **Deterministic-ish but not boring.** Scoring is deterministic; final
selection samples from the top-K with temperature, so two plays of the same
album don't always lead to the identical next song.
4. **Explainable.** Every recommendation response can carry a human-readable
`reasons[]` (used by clients' "Why this song?" UI and by debugging).
5. **Degrades gracefully.** If audio analysis hasn't run yet (fresh library),
the engine falls back to metadata-only scoring; if even metadata is thin,
it falls back to library-popularity + shuffle. There is never a state where
auto-play silently does nothing while tracks exist.
---
## 2. Pipeline Overview
```mermaid
flowchart LR
subgraph Offline["Offline (scan / analysis workers)"]
SC[Library scanner
doc 03] --> META[Metadata features
tags, genre, year, artist]
SC --> AF[Audio analysis worker
decodes first 90s via ffmpeg]
AF --> FV[(track_features table
doc 02 §8)]
META --> FV
FV --> ANN[ANN index build
per-library HNSW, in-process]
end
subgraph Online["Online (per request, <100ms)"]
H[(play_history)] --> CTX[Context builder
last-N window, decay weights]
CTX --> CAND[Candidate generation
ANN top-200 ∪ metadata buckets]
CAND --> SCORE[Scorer
weighted linear blend]
SCORE --> FILT[Filters
anti-repeat, dedupe, user rules]
FILT --> PICK[Top-K temperature sampling]
PICK --> OUT[next track + reasons]
end
ANN -.loaded at startup.-> CAND
```
---
## 3. Feature Extraction (offline)
Runs as a background job pool after the library scanner registers/updates a
track (job framework per doc 01 §6). Two tiers:
### 3.1 Tier 1 — Metadata features (always available, instant)
Extracted from tags during scan; no audio decode needed.
| Feature | Source | Encoding |
|---|---|---|
| `artist_id`, `album_artist_id` | tags | categorical (ID match) |
| `genres[]` | tags, split on `;`/`,`, normalized lowercase | multi-hot over library genre vocabulary |
| `year` | tags | numeric, bucketed by half-decade for matching |
| `duration_ms` | container | numeric |
| `track_gain` (ReplayGain/R128 if present) | tags | numeric, proxy for loudness when Tier 2 absent |
| `label`, `composer` | tags | categorical (optional) |
### 3.2 Tier 2 — Audio features (analysis worker)
The analysis worker decodes a deterministic sample of the file — **the middle
90 seconds** (skip intro/outro bias), mono, 22.05 kHz — through the server's
ffmpeg transcode path (reusing the byte-range reader from doc 03, so analysis
of S3/WebDAV files streams only the needed bytes when the container allows;
otherwise it reads sequentially up to the needed point). From the PCM it
computes, with our own DSP code (Rust, `rustfft`-based — no Essentia
dependency, to keep the build simple):
| Feature | Dim | Method (summary) |
|---|---|---|
| `tempo_bpm` | 1 | onset-strength autocorrelation (spectral-flux onsets, comb scoring 60–180 BPM, octave correction) |
| `energy` | 1 | mean RMS, dB-scaled, normalized 0–1 over library |
| `dynamic_complexity` | 1 | stddev of frame RMS |
| `spectral_centroid_mean` | 1 | brightness proxy |
| `spectral_rolloff_mean` | 1 | 85 % rolloff |
| `zero_crossing_rate` | 1 | percussiveness/noisiness proxy |
| `mfcc_mean[1..13]`, `mfcc_var[1..13]` | 26 | 13 MFCCs over 2048-sample frames, hop 512; mean+variance pooling |
| `chroma_mean[12]` | 12 | 12-bin chroma; coarse key/harmony proxy |
| `key`, `mode` | 2 | Krumhansl–Schmuckler template correlation on pooled chroma |
These are concatenated and **L2-normalized into a 44-dim "timbre vector"**
(`tempo`, `energy`, complexity, centroid, rolloff, ZCR scaled to comparable
ranges via library-wide robust z-scoring — median/IQR — then the MFCC and
chroma blocks). Stored in `track_features` (doc 02 §8) as:
```sql
track_features(
track_id PK/FK,
feature_version INT, -- bump invalidates & triggers re-analysis
timbre_vec BYTEA, -- 44 × f32, little-endian
tempo_bpm REAL,
energy REAL,
key_idx SMALLINT, -- 0-11, -1 unknown
mode SMALLINT, -- 1 major, 0 minor, -1 unknown
analyzed_at TIMESTAMPTZ
)
```
`feature_version` starts at 1; any change to the extraction algorithm bumps it
and the analysis workers lazily re-process (oldest-version-first, rate-limited
so re-analysis never starves transcoding workers — shared job-pool priority
classes per doc 01 §6.3).
### 3.3 ANN index
Per-library in-process **HNSW** index over the timbre vectors
(M=16, efConstruction=200, efSearch=64), rebuilt incrementally on track
add/remove and snapshotted to disk in the cache directory so server restart
doesn't trigger a full rebuild. 50k tracks × 44 f32 ≈ 9 MB of vectors — RAM is
a non-issue; this is why we choose in-process ANN over a vector-DB dependency.
If the index is unavailable (corrupt snapshot, mid-rebuild), candidate
generation falls back to metadata buckets only (§5) — the engine never blocks
on the index.
---
## 4. Listening Context (the last-N window)
### 4.1 User configuration
Per-user settings (doc 02 §7 `user_settings`, exposed in §8 API):
| Setting | Default | Range | Meaning |
|---|---|---|---|
| `autoplay.enabled` | `true` | bool | master switch |
| `autoplay.window_n` | `10` | 1–100 | **how many recent plays form the context** (the "X last played songs" requirement) |
| `autoplay.last_song_weight` | `0.5` | 0–1 | share of context weight pinned to the *most recent* song; the remaining `1 − w` is spread over the rest of the window with exponential decay |
| `autoplay.decay_half_life` | `5` | 1–50 | in plays; decay rate for the rest of the window |
| `autoplay.exploration` | `0.3` | 0–1 | temperature for top-K sampling (§6.3) |
| `autoplay.avoid_repeat_minutes` | `120` | 0–1440 | anti-repeat horizon (§6.2) |
| `autoplay.same_artist_penalty` | `0.15` | 0–1 | discourage long same-artist runs |
| `autoplay.scope` | `library` | `library`\|`queue_source` | recommend across the whole library, or stay inside the playlist/album the queue came from until exhausted |
### 4.2 Context vector
Let plays `p₀` (most recent) … `p_{N−1}` be the last N **completed or ≥30 s**
plays (skips shorter than 30 s are excluded from the positive context but
recorded as **negative signals**, §6.2). Weights:
```
w₀ = last_song_weight
wᵢ = (1 − last_song_weight) · dᵢ / Σⱼ dⱼ for i = 1..N−1
dᵢ = 0.5^(i / decay_half_life)
```
Context timbre vector: `c = Σ wᵢ · v(pᵢ)`, re-L2-normalized. Context metadata
profile: weighted multiset of artists, genres, year-buckets with the same
weights. Tracks missing Tier-2 vectors contribute only to the metadata
profile; the timbre context renormalizes over available vectors.
Edge cases:
- **History shorter than N:** use what exists; if zero history, skip to
fallback ladder (§7).
- The window is computed per **user**, across devices/clients (history is
server-side, doc 02 §6), so handing off from phone to Cast keeps context.
---
## 5. Candidate Generation
Union of, capped at 500 candidates before scoring:
1. **ANN neighbors:** top-200 by cosine similarity to context vector `c`
(skipped if no Tier-2 data yet).
2. **Metadata buckets:** up to 100 tracks sharing an artist with the window,
up to 100 sharing a top-3 context genre, weighted-random within bucket by
library play-count (popularity prior).
3. **Queue-source pool:** if `autoplay.scope = queue_source`, the unplayed
remainder of the source playlist/album *replaces* (1)+(2) until it is
exhausted, after which scope silently widens to `library` (clients show
"Continuing with similar songs").
4. **Exploration pool:** 25 uniformly random never-played tracks (gives the
exploration term in §6.1 something to surface; keeps recommendations from
collapsing onto the played subset).
Excluded outright before scoring: tracks in the current queue, tracks the user
has explicitly disliked (`track_reactions.reaction = 'dislike'`, doc 02 §9),
and tracks whose backend is currently offline (storage health flag, doc 03
§8 — recommending an unplayable track is a bug, not a degradation).
---
## 6. Scoring
### 6.1 Score function
For candidate `t`:
```
score(t) = w_sim · sim_timbre(c, v(t)) // cosine, 0 if no vector
+ w_meta · meta_affinity(t) // §6.1.1
+ w_tempo · tempo_compat(t) // §6.1.2
+ w_nov · novelty(t) // §6.1.3
+ w_pop · popularity_prior(t) // log-scaled user+library plays, normalized
− penalties(t) // §6.2
```
Default weights (server config, hot-reloadable; per-user override is a later
milestone): `w_sim=0.40, w_meta=0.25, w_tempo=0.10, w_nov=0.15, w_pop=0.10`.
When the candidate has no timbre vector, `w_sim`'s mass is redistributed
proportionally to `w_meta` and `w_tempo` (so metadata-only libraries still get
sane rankings rather than uniformly losing 0.4).
**6.1.1 `meta_affinity`:** weighted Jaccard-style overlap between the
candidate's {artist, album-artist, genres, year-bucket} and the context
metadata profile; artist match capped so a single dominant artist can't pin
the score (cap = 0.6 of the meta term).
**6.1.2 `tempo_compat`:** `exp(−(Δbpm_oct / 12)²)` where `Δbpm_oct` is the
BPM distance after octave-folding (half/double-time equivalence), vs the
context's weighted mean BPM.
**6.1.3 `novelty`:** `1 / (1 + user_play_count(t))`, scaled by recency — a
track unplayed for 90+ days counts as fully novel even if historically played
a lot.
### 6.2 Penalties & hard filters
| Rule | Type | Effect |
|---|---|---|
| Played within `avoid_repeat_minutes` | **hard filter** | excluded |
| Same track as any of last 3 window entries | **hard filter** | excluded (even beyond the time horizon) |
| Same artist as current track | soft | −`same_artist_penalty`, doubled if the last 2 plays were already that artist (breaks artist ruts) |
| Same album as current track | soft | −0.05 (unless `scope=queue_source` on that album) |
| Skip signal: user skipped this track <30 s within last 14 days | soft | −0.25 per recent skip, capped −0.5 |
| Disliked artist (≥3 dislikes for the artist) | soft | −0.3 |
### 6.3 Selection
Take top-K (K=10) by score; sample one with softmax temperature
`τ = 0.05 + 0.45 · exploration` over the scores. `exploration = 0` ⇒ τ=0.05 ≈
argmax; `exploration = 1` ⇒ broad sampling. The full ranked top-K is returned
when the client asks for a *radio batch* (§8.2) rather than a single next
track.
### 6.4 Reasons
Each returned track carries machine+human reasons, e.g.:
```json
"reasons": [
{"code": "TIMBRE_SIMILAR", "detail": "Sounds similar to 'Song A'", "weight": 0.31},
{"code": "GENRE_MATCH", "detail": "Shares genre 'shoegaze' with 4 of your last 10 plays", "weight": 0.18},
{"code": "NOVELTY", "detail": "You haven't played this in 6 months", "weight": 0.12}
]
```
---
## 7. Fallback Ladder
Evaluated top-down; first applicable level wins:
1. **Full:** Tier-2 vectors for ≥ 30 % of library and ≥ 1 play in window →
full pipeline.
2. **Metadata-only:** any play history → §5 buckets + §6 without `w_sim`.
3. **Popularity shuffle:** no usable history (new user) → weighted-random by
library-wide play count, else uniform random; anti-repeat filters still
apply.
4. **Empty library:** `next = null` (clients stop playback gracefully; Cast
receiver gets `NEXT_ITEM_RESPONSE {item:null}`, doc 06).
The active level is reported in responses as `"strategy"` so clients/tests can
assert behavior.
---
## 8. API Surface
Full schemas in `api/openapi.yaml`; summarized contract:
### 8.1 `POST /api/v1/recommendations/next`
Request:
```json
{
"windowN": 10,
"queueTrackIds": ["trk_a", "trk_b"],
"queueSourceId": "pl_44d1",
"overrides": { "exploration": 0.5 }
}
```
`windowN` and `overrides` are optional per-request overrides of the user's
stored settings (clients pass the user's live UI value; nothing is persisted
by this call). Response: one track (full track object) + `score`, `strategy`,
`reasons[]`, or `{"track": null, "strategy": "empty_library"}`.
### 8.2 `POST /api/v1/recommendations/radio`
Same inputs + `"count": 1–50`; returns a ranked batch (used for "Start radio
from this song" — the seed song is passed as a synthetic single-entry window
via `"seedTrackId"`).
### 8.3 Settings
`GET/PATCH /api/v1/me/settings/autoplay` — the §4.1 settings object, validated
server-side against the stated ranges.
### 8.4 Feedback
`POST /api/v1/tracks/{id}/reaction` `{"reaction": "like"|"dislike"|null}` —
feeds §6.2. Skips are inferred from history events (a `POST /history` with
`completed:false, positionMs<30000`), not a separate endpoint.
### 8.5 Subsonic compatibility
Subsonic's `getSimilarSongs` / `getSimilarSongs2` (`id`, `count`) are mapped
onto §8.2 with the given song as seed — so existing Subsonic clients
(DSub, Symfonium, etc.) get FablePool recommendations for free.
---
## 9. Performance & Sizing
| Stage | Budget (50k tracks, 4-core) |
|---|---|
| Context build (DB read of last N plays + vectors) | ≤ 10 ms (history indexed by `(user_id, played_at DESC)`) |
| ANN top-200 | ≤ 5 ms (HNSW efSearch=64) |
| Metadata buckets (indexed queries) | ≤ 20 ms |
| Scoring ≤ 500 candidates | ≤ 5 ms (pure in-memory arithmetic) |
| Total p95 | **< 100 ms** |
| Tier-2 analysis throughput | ≥ 4 tracks/s/core target (90 s @ 22.05 kHz mono decode + FFTs); a 10k-track library analyzes overnight on a NAS |
Analysis is strictly lower job priority than streaming/transcoding (doc 01
§6.3) and is pausable via admin API.
---
## 10. Evaluation & Testing (engineering acceptance, not research)
- **Golden tests:** fixture library (~200 CC-licensed clips) with
precomputed feature vectors checked into test data; scoring unit tests
assert ranking invariants (e.g. same-genre+similar-tempo track outranks
opposite-genre track given a clean context).
- **Property tests:** anti-repeat hard filters can never be violated; output
track is always playable (backend online, not disliked-hard-filtered).
- **Determinism harness:** with `exploration=0` and a fixed history, the next
track is deterministic → snapshot tests.
- **Offline replay metric (CI, informational):** hit-rate@10 — replay real
(donated, anonymized) listening sessions, check how often the actually-
played next song appears in our top-10. Tracked over time, not gated.
- **Latency gate:** integration test asserts p95 < 100 ms on the 50k synthetic
library fixture.