# ADR 0011: Meilisearch as the MVP Search Engine - **Status:** Accepted - **Date:** 2025-01-15 - **Deciders:** Core architecture team - **Related:** ADR 0003 (PostgreSQL), ADR 0012 (Redis + Celery), docs/architecture/10-api-design.md ## Context Search is a first-class feature of the platform, not an afterthought. Learners must be able to find problems and courses by: - Full-text query across title, statement, hints, and solution prose (published versions only). - Faceted filters: topic, tags, difficulty, problem type, author, language, status, prerequisites. - Typo-tolerant matching ("pythagoraen" → "Pythagorean") because our audience includes non-native English speakers and learners typing on mobile. - Relevance ranking that blends textual relevance with community signals (votes, attempt counts, acceptance ratio). We considered four options: 1. **PostgreSQL full-text search (`tsvector` / `pg_trgm`)** — no extra infrastructure; queries live alongside transactional data. 2. **Meilisearch** — lightweight, single-binary search engine with typo tolerance, faceting, and ranking rules out of the box. 3. **OpenSearch / Elasticsearch** — industry-standard distributed search; richest feature set, heaviest operational cost. 4. **Typesense** — similar profile to Meilisearch; comparable features, slightly different licensing and API ergonomics. ### Constraints that shaped the decision - We are an open-source project run by volunteers; **operational simplicity matters more than scale headroom** for the MVP. Many self-hosters will run the whole platform on a single small VPS. - Faceted filtering and typo tolerance are hard requirements; PostgreSQL FTS handles neither well without significant custom work (trigram similarity tuning, manual facet aggregation queries, no real ranking-rule control). - Search must index **only published versions** of content (per ADR 0007's immutability rules), so the indexing pipeline is a projection, not the source of truth. Losing the index must never lose data; rebuilding from PostgreSQL must always be possible. - Multilingual content (docs/architecture/09): the engine should handle non-Latin scripts reasonably without per-language analyzer configuration burden. ## Decision We adopt **Meilisearch (v1.x)** as the MVP search engine, with the following architecture: 1. **PostgreSQL remains the single source of truth.** Meilisearch indexes are disposable projections. A management command `rebuild_search_index` re-derives every index from the database; runbooks treat index loss as a non-incident. 2. **Two primary indexes:** `problems` and `courses`. Each document is built from the latest *published* version, denormalized to include: title, statement plain-text extraction (MDX stripped of widget bodies), tags, topic path, difficulty, problem type, author handle, language, vote score, attempt count, acceptance rate, and published timestamp. Draft/in-review content is never indexed. 3. **Asynchronous indexing via Celery** (ADR 0012). The `ContentPublished`, `ContentUnpublished`, and `VersionRolledBack` domain events enqueue index-update tasks. Indexing is idempotent (documents keyed by stable entity ID, not version ID) so retries are safe. 4. **Ranking rules:** Meilisearch defaults (`words`, `typo`, `proximity`, `attribute`, `sort`, `exactness`) plus a custom descending sort attribute `quality_score`, a precomputed blend of normalized vote score and acceptance-rate signal recomputed nightly by a Celery beat task. We deliberately keep the blend formula in application code, not in Meilisearch settings, so it is testable and auditable. 5. **Facets:** `tags`, `topic`, `difficulty`, `problem_type`, `language`, `author`. The frontend search page issues a single Meilisearch query with `facets` requested, via a thin backend proxy endpoint (`GET /api/v1/search`) — the frontend never talks to Meilisearch directly, so the search API key never leaves the server and we can enforce visibility rules and rate limits centrally. 6. **API keys:** the backend uses a scoped key with `search` + `documents` + `settings` actions on the two indexes only; the master key lives solely in deployment secrets and is used by migration/rebuild tooling. ### Migration path to OpenSearch We accept that a large deployment may eventually need OpenSearch (aggregation analytics, vector search at scale, multi-node resilience). To keep that door open: - All search access goes through a `SearchBackend` interface in the Django service layer (`search()`, `index_document()`, `delete_document()`, `rebuild()`), with `MeilisearchBackend` as the sole MVP implementation. - Index documents are defined as serializers in application code, engine-agnostic. - The public `/api/v1/search` contract exposes our own response shape, never Meilisearch's raw response, so the engine can change without an API break. ## Alternatives Considered | Option | Why rejected (for MVP) | |---|---| | PostgreSQL FTS | No typo tolerance; faceting requires hand-rolled aggregate queries per filter; relevance tuning is primitive; multilingual support requires per-language `tsvector` configs. Acceptable as a degraded fallback, not as the primary experience. | | OpenSearch/Elasticsearch | JVM heap requirements (realistically 1–2 GB minimum) are hostile to small self-hosted deployments; cluster operations (shards, ILM, snapshots) are a volunteer-ops burden disproportionate to MVP needs. | | Typesense | Genuinely competitive. We chose Meilisearch for its larger community mindshare in our reference stacks, simpler ranking-rule model, and the team's prior operational experience. This was a close call, recorded so future maintainers know Typesense was not overlooked. | ## Consequences **Positive** - Typo tolerance, faceting, highlighting, and sane relevance with near-zero tuning. - Single static binary / single container; ~100 MB RAM footprint at MVP scale; trivially included in Docker Compose. - Index rebuilds from PostgreSQL keep disaster recovery simple. **Negative / Accepted risks** - Meilisearch is single-node; no replication in our deployment tier. Mitigated by the disposable-index posture — availability of search degrades, data is never lost. - No fine-grained per-document ACLs in the engine; mitigated by indexing only public/published content and proxying all queries through the backend. - A second piece of stateful-ish infrastructure to run. Accepted; the deployment docs mark Meilisearch as optional, with a PostgreSQL-`ILIKE` degraded mode behind a settings flag for ultra-minimal installs. **Follow-ups** - Define the `quality_score` blend formula and nightly recompute task in the backend milestone. - Add `rebuild_search_index` to the operations runbook with expected runtimes per 10k documents.