# Non-Goals for v1 Lagoon v1 is deliberately narrow. This document lists what Lagoon **does not** try to do in its first major release, and why. Treating these as explicit non-goals keeps the core small, auditable, and honest about its operational envelope. Several items here are revisited in the [roadmap](roadmap.md) as candidates for later releases; others are permanent non-goals. If you need one of these capabilities today, Lagoon is probably the wrong tool for that workload — and we would rather tell you that up front than have you discover it in production. --- ## 1. Not a distributed consensus system **Non-goal:** multi-writer clusters, Raft/Paxos replication, leader election, or any coordination protocol between compute nodes. **Why:** Lagoon's durability story is object storage, not a quorum of nodes. A single writer per namespace (enforced by conditional writes on the manifest) plus any number of stateless readers covers the target workloads — RAG corpora, search indexes, embedding stores — without the operational weight of a consensus layer. Multi-node *read* scale-out works in v1 because readers only consume immutable objects; multi-node *write* scale-out is a roadmap item (per-namespace writer leases), not a v1 feature. **Consequence you should plan for:** concurrent writers to the same namespace will see conditional-write conflicts; one of them loses and must retry. This is documented behavior, not a bug. ## 2. Not a strongly consistent multi-node database **Non-goal:** linearizable reads across a fleet of query nodes. **Why:** readers cache manifests and segments. A read served by a node with a stale cached manifest can lag the latest commit by up to the manifest refresh interval. Single-node deployments get read-your-writes (the writer invalidates its own caches on commit); multi-node deployments get bounded-staleness reads. The exact guarantees are specified in the [architecture guide](architecture.md) — we will not paper over them with marketing language. ## 3. Not an OLTP or relational database **Non-goal:** SQL, joins, secondary unique constraints, multi-document transactions, foreign keys, or interactive row-level mutation at OLTP rates. **Why:** the storage layout (append-only WAL + immutable segments + periodic compaction) is optimized for batched ingest and read-heavy search. Single- document patches work, but they are implemented as WAL appends and are not cheap at high per-document mutation rates. If your workload is "update one row ten thousand times per second," use an OLTP database and sync into Lagoon. The only transactional unit in v1 is a **single batch write to a single namespace** (atomic via the manifest commit). There are no cross-namespace transactions. ## 4. Not an in-memory graph-index engine **Non-goal:** HNSW or other graph ANN indexes that require the full graph resident in RAM, and the single-digit-millisecond p99s they enable. **Why:** graph indexes fight the object-storage-native design — they are expensive to build incrementally, awkward to page from cold storage, and force memory provisioning proportional to corpus size. Lagoon's IVF/centroid index is chosen precisely because its posting lists are independently fetchable objects that cache well on SSD. The honest trade: warm IVF queries land in the low tens of milliseconds at our tested scales (see the [benchmark guide](benchmark-guide.md)), not the sub-millisecond range that RAM-resident graph engines advertise. We will not claim otherwise. ## 5. Not an embedding service **Non-goal:** generating embeddings, bundling models, or depending on any embedding vendor. **Why:** embedding models churn monthly and pricing/licensing varies wildly. Lagoon stores and searches vectors; producing them is your pipeline's job. The demos ship a pluggable provider shim (`demos/common/embeddings.py`) with hash-based, sentence-transformers, and OpenAI-compatible backends purely as a convenience — none of them is a dependency of the server, and the server never makes outbound network calls to embedding APIs. ## 6. Not a real-time streaming system **Non-goal:** sub-second ingest-to-visible latency guarantees, change-data- capture feeds, or subscription/notification APIs. **Why:** writes are durable on WAL commit, but visibility of *indexed* (ANN / inverted-index) search over new data depends on background indexing, which runs on a configurable interval. Freshly written documents are still queryable via the exact/scan paths before indexing completes, so correctness is preserved — but if your SLA is "indexed and searchable within 200 ms of write," v1 does not promise that. ## 7. Not a multi-tenant SaaS control plane **Non-goal:** billing, usage metering for invoicing, per-tenant noisy-neighbor isolation guarantees, self-serve signup, or a hosted dashboard. **Why:** v1 ships the primitives a platform team needs — organizations, projects, namespaces, API keys with admin/writer/reader roles, optional rate limits and quotas, audit logs — and stops there. Building a SaaS on top of Lagoon is a supported use case; *being* the SaaS is not. ## 8. Not a security appliance **Non-goal:** built-in TLS termination, mTLS between components, per-document ACLs, row-level security, or client-side encryption. **Why:** v1's security model is API keys + roles at the namespace boundary, deployed behind your reverse proxy/service mesh for transport security, with encryption-at-rest delegated to the object-storage provider (SSE-S3 / SSE-KMS; see the [deployment guide](deployment.md#encryption-at-rest)). Per-document authorization belongs in your application layer in v1. The threat model and hardening guidance in the deployment guide state exactly what is and is not covered. ## 9. Not exhaustive query-language coverage Specific query features that are **out of scope for v1**: - Regular-expression filters (only `Eq`/`NotEq`/range/`In`/`ContainsAny`/ prefix-style string matching are supported). - Aggregations, facets, and group-by. - Geo-spatial queries. - Fuzzy/typo-tolerant full-text matching (BM25 over exact tokens only; stemming is per-namespace configurable, edit-distance matching is not). - Learned/re-ranking models inside the engine. Hybrid fusion (weighted-sum and RRF) is the ceiling; cross-encoder re-ranking happens in your application, as the RAG demo shows. - Vector quantization (PQ/SQ) for the v1 on-disk format. Vectors are stored as float32; the segment format reserves an encoding byte so quantization can arrive without a format break. ## 10. Not a benchmark-marketing project **Non-goal:** claiming performance parity with Turbopuffer, Elasticsearch, Qdrant, or any other system we have not benchmarked head-to-head under published, reproducible conditions. **Why:** Lagoon is *inspired by* the object-storage-native architecture that Turbopuffer described publicly, but it is an independent clean-room implementation, and we have no access to those systems' internals or comparable test environments. All performance numbers we publish come from the bundled benchmark suite, with hardware, dataset, and configuration fully disclosed (see [benchmarks/results/TEMPLATE.md](../benchmarks/results/TEMPLATE.md)). Numbers without a reproduction recipe do not get published. Pull requests that add unverifiable comparison claims to the docs will be declined. ## 11. Other explicit exclusions | Excluded in v1 | Status | |---|---| | GPU acceleration for distance computation | Permanent non-goal for the core; viable as an external plugin | | Windows as a supported server platform | Roadmap "maybe" — clients and CLI work; the server is tested on Linux/macOS only | | Embedded/library mode (linking the engine into your process) | Roadmap candidate; the storage crates are structured to allow it, but the supported surface is the HTTP API | | Schema migration tooling | Documents are schemaless; attribute-index definitions can be changed and rebuilt, but there is no general migration framework | | Cross-region replication | Use your object store's bucket replication; Lagoon does not coordinate it and does not guarantee read consistency across replicated regions | | Plugin/extension API stability | Internal Rust APIs may break between minor versions until 1.0; only the HTTP API and storage format carry compatibility promises | --- ## How to read this document A non-goal is not a value judgment about the feature — most of these are excellent features in systems designed for them. It is a statement that v1 will not attempt them, so that what v1 *does* ship — durable object-storage- native storage, IVF vector search, BM25, hybrid fusion, filters, branching, and a tight cache hierarchy — ships solid, tested, and honestly documented. When a non-goal graduates to a goal, it moves to the [roadmap](roadmap.md) with a design sketch, and this document is updated in the same pull request.