# 04 — Data Source Evaluation & Selection ## 1. Selection principles Per product principle **P5 (no lock-in)** and **NFR-COST/NFR-OPS**, every external data source must satisfy: - **C1 — Free for our scale:** usable at 10k MAU and self-hostable instances without payment, API keys are acceptable only if free tiers are unmetered or generous and the source is substitutable. - **C2 — Open license:** data redistributable under an open license compatible with serving it to users (attribution requirements acceptable). - **C3 — Substitutable:** at least one documented alternative behind the same internal abstraction. - **C4 — Adequate quality:** global coverage; for wind, ≤ 3 h temporal and ≤ ~11 km spatial resolution; gust data available. - **C5 — Stable & maintained:** operated by an institution or sustainable project, versioned API. Each candidate is scored ✓ (meets), ◐ (partial), ✗ (fails). --- ## 2. Map rendering library | Candidate | C1 | C2 | C3 | C4 | C5 | Notes | |---|---|---|---|---|---|---| | **MapLibre GL JS** | ✓ | ✓ BSD-3 | ✓ | ✓ | ✓ | Vector tiles, WebGL, smooth zoom/rotate, data-driven styling, built-in clustering. Community fork of Mapbox GL v1; active. | | Leaflet | ✓ | ✓ BSD-2 | ✓ | ◐ | ✓ | Rock-solid, tiny, raster-first. Vector styling and 10k-marker performance weaker; overlays (wind field) harder at 60 fps. | | OpenLayers | ✓ | ✓ BSD-2 | ✓ | ✓ | ✓ | Very capable, heavier API surface and bundle; steeper contributor learning curve. | | Mapbox GL JS v2+ | ✗ | ✗ proprietary | — | ✓ | ✓ | License requires Mapbox account/billing. Excluded on C1/C2. | **Decision: MapLibre GL JS** (target major version **^4**). Rationale: vector tiles keep bandwidth low for field use, data-driven marker styling lets us re-color hundreds of score badges client-side instantly (F1 §5), and built-in clustering covers F1 §3. **Leaflet (^1.9) is the documented fallback** (C3): the frontend's map layer is wrapped in a thin `MapAdapter` interface (init, markers, clusters, fitBounds, events) so a Leaflet implementation can be swapped in for low-end-device builds or by forks. ## 3. Map tiles & geocoding | Candidate | C1 | C2 | C3 | C4 | C5 | Notes | |---|---|---|---|---|---|---| | **OpenStreetMap data via OpenFreeMap / self-hosted vector tiles** | ✓ | ✓ ODbL | ✓ | ✓ | ✓ | OpenFreeMap serves free vector tiles from OSM with no key; same schema self-hostable via Planetiler/tilemaker. | | OSM raster tile servers (tile.openstreetmap.org) | ◐ | ✓ | ✓ | ◐ | ✓ | Usage policy prohibits heavy app use; acceptable only for dev and the Leaflet fallback at tiny scale. | | MapTiler / Stadia free tiers | ◐ | ◐ | ✓ | ✓ | ✓ | Free tiers metered + key required; fine as *optional* configured alternatives, never the default. | | Protomaps (PMTiles) | ✓ | ✓ | ✓ | ✓ | ✓ | Single-file tiles on object storage; excellent self-host option. | **Decision:** Default to **OpenFreeMap vector tiles** (no key, OSM-derived) with the tile URL **configurable** at deploy time; document **Protomaps/PMTiles self-hosting** and **Planetiler-generated tiles** as first-class alternatives. Satellite layer: **Esri World Imagery is excluded** (terms); use **Sentinel-2 cloudless by EOX** (CC-BY) where licensing fits, else omit satellite in default deployments — terrain hillshade from open DEM tiles instead. OSM attribution rendered permanently (F1 acceptance criteria). **Geocoding:** **Nominatim** public API for place search (C1 with strict usage policy: ≤ 1 req/s, proper User-Agent, results not stored) — enforced client-side by debounce and server-side proxy rate limit. Self-hosters with heavy traffic can point the same interface at a self-hosted Nominatim or **Photon** instance. Reverse geocoding (spot address hints in the editor) uses the same proxy. ## 4. Elevation / terrain (supporting data) Spot creation auto-fills elevation from **Open-Meteo's elevation API** (free, no key) with **Open-Elevation** (self-hostable) as substitute. Used for display and future terrain-exposure heuristics; not score-critical in v1. --- ## 5. Weather & wind providers — the core evaluation Requirements recap: current conditions + hourly forecast ≥ 7 days; fields: 10 m wind speed, **gusts**, direction, precipitation (amount + probability), temperature, cloud/weather code; global coverage; free; attribution-friendly. | Candidate | C1 | C2 | C3 | C4 | C5 | Notes | |---|---|---|---|---|---|---| | **Open-Meteo** | ✓ no key, free non-commercial; self-hostable (AGPL) | ✓ CC-BY 4.0 data | ✓ | ✓ global; multi-model (ICON, GFS, ECMWF-based best_match); hourly 7–16 d; gusts ✓ | ✓ | Best overall fit. JSON, bulk multi-point queries, historical + forecast in one API family. | | **NOAA NWS API (api.weather.gov)** | ✓ free, no key (User-Agent required) | ✓ US-Gov public domain | ✓ | ◐ US-only; high-quality gridpoint forecasts incl. gusts; observations from ASOS/METAR stations | ✓ | Excellent US accuracy + *observed* (not modeled) current conditions from real stations. | | Met Norway Locationforecast (api.met.no) | ✓ free, UA required | ✓ CC-BY | ✓ | ✓ global, strong in Europe; gusts available in some products | ✓ | Strong substitute candidate; terms require proper attribution + caching. | | DWD Open Data (raw ICON) | ✓ | ✓ | ✓ | ◐ raw GRIB; needs our own processing pipeline | ✓ | Too heavy for v1; viable for a future self-processing milestone. | | OpenWeatherMap | ◐ keyed, metered free tier | ✗ restrictive | ✓ | ✓ | ✓ | Fails C2 and lock-in smell. Excluded as default. | | Windy API | ✗ paid | ✗ | — | ✓ | ✓ | Excluded. | **Decision: Open-Meteo as primary provider, NOAA NWS as secondary for US spots, Met Norway documented as tertiary substitute.** - **Open-Meteo** (`api.open-meteo.com/v1/forecast`) supplies both "current" (model analysis) and hourly forecast globally with one API shape. Target fields: `wind_speed_10m`, `wind_gusts_10m`, `wind_direction_10m`, `temperature_2m`, `precipitation`, `precipitation_probability`, `weather_code`, `cloud_cover`. We pin `wind_speed_unit` and `timezone` per request. Its AGPL-licensed server is self-hostable — the ultimate lock-in escape hatch. - **NOAA NWS** adds *observed* station data (`/stations/{id}/observations/latest`) and gridpoint forecasts for US spots, where Persona Maya/Dev accuracy matters most. Enabled per-deployment by config; spots outside NWS coverage automatically use Open-Meteo. ### Provider abstraction (C3 enforcement) The backend defines a `WeatherProvider` trait/interface: ``` get_current(lat, lon) -> CurrentConditions { wind_speed_ms, wind_gust_ms, wind_dir_deg, temp_c, precip_mm, precip_prob_pct, weather_code, cloud_pct, observed_at, source: ProviderId, model: String } get_hourly(lat, lon, hours: u16) -> Vec ``` All internal units are **SI (m/s, °C, mm)**; unit conversion is a presentation concern. A `CompositeProvider` routes by spot location and configured priority list, with per-provider circuit breakers and the staleness contract from F3. Every stored reading records `(provider, model)` for attribution (F6 §5) and later accuracy auditing. ### Quota & caching math (sanity check vs. NFR-COST) Weather-cell deduplication (F3 §2: spots within ~5 km share a cell) + 10-min current cache: a deployment tracking 5,000 active spots collapses to ≈ 1,500 cells ⇒ ≤ 9,000 upstream calls/hour worst-case, but viewport-driven laziness (only cells with recent viewers refresh) empirically cuts this ~90%. Open-Meteo's free non-commercial guidance (~10k calls/day) fits a hobby instance; the flagship instance either stays within fair use via lazy refresh or self-hosts the Open-Meteo stack. This math is documented so self-hosters can size their instances. --- ## 6. Spot seed data Initial spot database bootstrapped from **OpenStreetMap** extracts (Overpass queries) for candidate features: `leisure=park` + large open area heuristics, `natural=beach`, known `sport=kitesurfing` tagged nodes/ways, and kite-club-published public lists where licenses permit. All imports are flagged `provenance=osm-import`, carry ODbL attribution (F8 §4), enter as **unverified** status, and require a community verification edit before they show a "verified" check. The community database itself is published under **ODbL** (see charter) so it can flow back to the commons. ## 7. Risk register | Risk | Likelihood | Impact | Mitigation | |---|---|---|---| | Open-Meteo fair-use tightens | Med | High | Provider abstraction + Met Norway substitute + self-host Open-Meteo (AGPL) | | OpenFreeMap sunset | Low | Med | Tile URL configurable; PMTiles self-host path documented from day one | | Nominatim policy violation at scale | Med | Low | Server proxy rate limit; Photon self-host path | | NWS API instability | Med | Low | NWS is enhancement-only; automatic fallback to Open-Meteo | | OSM-import seeds low quality | High | Med | Unverified status + moderation + verification workflow (F8) |