# Lagoon benchmark suite Reproducible benchmarks for Lagoon covering: | Benchmark | Script | Measures | |---|---|---| | Ingest | `bench_ingest.py` | upsert throughput (docs/s, MB/s), per-batch latency, index catch-up time | | Recall | `bench_recall.py` | recall@1/10/100 of ANN (IVF) vs exact kNN, plus latency of both modes | | Latency | `bench_latency.py` | cold vs warm latency for vector, BM25, hybrid (RRF), filtered vector | | Cache | `bench_cache.py` | memory/disk cache hit rates and object-store request counts per workload | | All | `run_all.py` | orchestrates the above and renders `results/SUMMARY.md` | Read **[docs/benchmark-guide.md](../docs/benchmark-guide.md)** for methodology, definitions (what "cold" means, how recall is computed), and the honest reporting policy before publishing any numbers. ## Quickstart ```bash # 1. Start Lagoon (filesystem backend for a smoke run, MinIO for realistic I/O) docker compose up -d # from the repo root # 2. Install benchmark deps cd benchmarks pip install -r requirements.txt # 3. Run everything (≈50k docs, 256-d vectors) export LAGOON_URL=http://localhost:8484 export LAGOON_API_KEY=dev-key python run_all.py --docs 50000 --dim 256 # 4. Collect cold samples (requires a server restart per sample) docker compose restart api python bench_latency.py --phase cold --namespace bench-ingest --dim 256 python run_all.py --summary-only # 5. Read results cat results/SUMMARY.md ``` All datasets are generated deterministically by `datagen.py` (seeded Gaussian mixture vectors + Zipf-distributed text), so runs are comparable across machines and versions. Results are written as JSON per benchmark plus a rendered Markdown summary; nothing is published automatically.