# Database Schema **Document:** 06 — Database Schema **Status:** Approved for Milestone 1 **Companion file:** `db/schema.sql` (authoritative DDL) --- ## 1. Engine Choice **PostgreSQL 15+ with PostGIS 3.3+.** Why: - **Geospatial is the core domain.** Spot lookup is "spots within viewport / radius, ordered by distance" — PostGIS `geography` columns with GiST indexes solve this natively and correctly (great-circle math, not flat-earth approximations). - **Open source, no lock-in** — mirrors the project charter. Runs identically on a laptop, a $5 VPS, and managed Postgres anywhere. - **One database for everything in v1.** Forecast caching uses a plain table with TTL semantics rather than introducing Redis; we accept slightly higher read latency in exchange for one fewer moving part. Revisit if cache QPS demands it (see `docs/08-architecture.md` §7). Conventions used throughout: - Primary keys are `uuid` (generated with `gen_random_uuid()` from `pgcrypto`) — safe to expose in URLs, no enumeration leaks. - All timestamps are `timestamptz` (UTC). `created_at`/`updated_at` on every mutable table; `updated_at` maintained by trigger. - Soft state via `status` enums rather than row deletion where moderation history matters (spots, reviews). - Names: `snake_case`, plural table names. --- ## 2. Entity-Relationship Overview ``` users ──< spots (created_by) users ──< spot_reviews >── spots users ──< session_reports >── spots users ──< favorites >── spots users ──< moderation_reports >── (spots | spot_reviews) spots ──< spot_wind_sectors spots ──< spot_obstacles spots ──< spot_photos spots ──< forecast_cache (by location grid-snap, see §5) scoring_profiles (lookup; referenced by API, soft FK from clients) schema_migrations (housekeeping) ``` --- ## 3. Tables ### 3.1 `users` Minimal account model; OAuth-first (no password column in v1 — auth via OAuth providers per `docs/08-architecture.md` §5; an optional email magic-link flow uses `auth_tokens`). | Column | Type | Notes | |---|---|---| | `id` | uuid PK | | | `display_name` | text NOT NULL | 3–40 chars, enforced by CHECK | | `email` | citext UNIQUE | nullable for OAuth-only accounts that don't share email | | `oauth_provider` | text | e.g. `github`, `google`; nullable | | `oauth_subject` | text | provider's stable user id; `(oauth_provider, oauth_subject)` UNIQUE | | `role` | `user_role` enum | `user` \| `moderator` \| `admin`, default `user` | | `preferred_units` | `unit_system` enum | `metric` \| `imperial` \| `nautical`, default `metric` | | `default_profile` | text | default kite profile key, FK → `scoring_profiles.key` | | `created_at` / `updated_at` | timestamptz | | ### 3.2 `spots` The heart of the app. A spot is a curated, community-submitted flying location — **not** every OSM park (those are rendered as a hint layer client-side; only places someone vouched for become spots). | Column | Type | Notes | |---|---|---| | `id` | uuid PK | | | `name` | text NOT NULL | 3–120 chars | | `description` | text | markdown, ≤ 5000 chars | | `location` | `geography(Point, 4326)` NOT NULL | launch point | | `area` | `geography(Polygon, 4326)` | optional flyable-area outline | | `spot_type` | `spot_type` enum | `park` \| `beach` \| `hill` \| `field` \| `water` \| `other` | | `is_water_launch` | boolean NOT NULL default false | drives kitesurf scoring path | | `shore_bearing_deg` | smallint | 0–359; required when `is_water_launch` (CHECK) | | `suitable_profiles` | text[] | profile keys the community deems suitable | | `surface` | text | grass, sand, gravel… free text from OSM `surface` tag | | `access_notes` | text | parking, paths, fees | | `hazards_notes` | text | free-text safety notes complementing structured obstacles | | `osm_id` | bigint | optional provenance link to OSM element | | `osm_type` | text | `node` \| `way` \| `relation`, CHECK-constrained | | `country_code` | char(2) | reverse-geocoded ISO-3166-1, for browse/filter | | `status` | `spot_status` enum | `pending` \| `published` \| `flagged` \| `archived` | | `created_by` | uuid FK → users | SET NULL on user deletion (content survives, attribution removed — GDPR-friendly) | | `created_at` / `updated_at` | timestamptz | | Indexes: GiST on `location` (the viewport query), GiST on `area`, btree on `(status, country_code)`, GIN on `suitable_profiles`, trigram GIN on `name` for search (`pg_trgm`). ### 3.3 `spot_wind_sectors` Good-wind direction sectors per spot (rubric §5). Multiple rows per spot; sector may wrap north (e.g. 330→30, modeled simply as `from_deg > to_deg` meaning wrap). | Column | Type | Notes | |---|---|---| | `id` | uuid PK | | | `spot_id` | uuid FK → spots ON DELETE CASCADE | | | `from_deg` / `to_deg` | smallint NOT NULL | 0–359 each | | `note` | text | e.g. "clean off the lake" | ### 3.4 `spot_obstacles` Structured obstacle tags consumed directly by the scoring engine (rubric §7). | Column | Type | Notes | |---|---|---| | `id` | uuid PK | | | `spot_id` | uuid FK → spots ON DELETE CASCADE | | | `tag` | `obstacle_tag` enum | the exact tag vocabulary from rubric §7 | | `bearing_deg` | smallint | nullable; where the obstacle sits relative to launch (enables direction-aware penalties) | | `source` | `data_source` enum | `osm` \| `community` \| `curated` | | `verified_by` | uuid FK → users | moderator who confirmed; nullable | | `created_at` | timestamptz | | UNIQUE `(spot_id, tag, bearing_deg)` to prevent duplicate tagging. ### 3.5 `spot_photos` | Column | Type | Notes | |---|---|---| | `id` | uuid PK | | | `spot_id` | uuid FK ON DELETE CASCADE | | | `uploaded_by` | uuid FK → users SET NULL | | | `storage_key` | text NOT NULL | object-store key; URLs are derived, never stored | | `caption` | text | ≤ 280 chars | | `width_px` / `height_px` | integer | for layout-shift-free rendering | | `license` | text NOT NULL default `CC-BY-SA-4.0` | uploads are CC-BY-SA per CONTRIBUTING | | `status` | `content_status` enum | `pending` \| `published` \| `removed` | | `created_at` | timestamptz | | ### 3.6 `spot_reviews` Star rating + text, one per user per spot (UNIQUE). | Column | Type | Notes | |---|---|---| | `id` | uuid PK | | | `spot_id` / `user_id` | FKs | `(spot_id, user_id)` UNIQUE; CASCADE on spot, CASCADE on user (reviews are personal data) | | `rating` | smallint NOT NULL CHECK 1–5 | | | `body` | text | ≤ 2000 chars | | `status` | `content_status` enum | | | `created_at` / `updated_at` | timestamptz | | A materialized-style denormalization (`spots.rating_avg`, `spots.rating_count`) is maintained by trigger so map pins can show ratings without a join-aggregate per tile. ### 3.7 `session_reports` "I flew here at time X, conditions were really Y" — the calibration loop for the rubric (§10) and social proof on spot pages. | Column | Type | Notes | |---|---|---| | `id` | uuid PK | | | `spot_id` / `user_id` | FKs, CASCADE | | | `flown_at` | timestamptz NOT NULL | | | `profile_key` | text NOT NULL FK → scoring_profiles | what they flew | | `experienced_rating` | smallint CHECK 1–5 | "how was it really" | | `predicted_score` | smallint | snapshot of our score at that hour (null if unavailable) | | `observed_wind_ms` | numeric(4,1) | optional user-observed wind | | `notes` | text ≤ 1000 | | | `created_at` | timestamptz | | Index on `(spot_id, flown_at DESC)` for spot timelines; on `(profile_key, experienced_rating)` for calibration queries. ### 3.8 `favorites` Pure join table: `(user_id, spot_id)` composite PK, both CASCADE, `created_at`. ### 3.9 `forecast_cache` Server-side cache of Open-Meteo (primary) / NOAA (US fallback) hourly data. **Keyed by grid-snapped coordinates** (0.05° ≈ 5 km snap), so nearby spots share one upstream fetch — this is what keeps us comfortably inside Open-Meteo's free-tier fair use. | Column | Type | Notes | |---|---|---| | `id` | uuid PK | | | `grid_lat` / `grid_lon` | numeric(7,4) NOT NULL | snapped; UNIQUE together with `provider` | | `provider` | `data_source_weather` enum | `open_meteo` \| `noaa` | | `payload` | jsonb NOT NULL | normalized hourly arrays (see API doc): time, wind_speed, gusts, direction, precip, precip_prob, cape, weather_code, temp | | `fetched_at` | timestamptz NOT NULL | | | `expires_at` | timestamptz NOT NULL | fetched_at + 30 min (current+48 h) / + 3 h (extended) | Index on `expires_at` for the sweeper job. `jsonb` (not normalized rows) is deliberate: the payload is read-whole, write-whole, never queried by field. ### 3.10 `scoring_profiles` Rubric constants as data (rubric §10). Seeded in `schema.sql`. | Column | Type | Notes | |---|---|---| | `key` | text PK | `single_line`, `sport_dual`, `power_traction`, `kitesurf` | | `display_name` | text NOT NULL | | | `w_min_ms` / `w_ideal_lo_ms` / `w_ideal_hi_ms` / `w_max_ms` | numeric(4,1) NOT NULL | trapezoid, CHECK ordered | | `gust_tolerant` | boolean NOT NULL | the +0.10 gust bonus class | | `precip_floor` | numeric(3,2) NOT NULL default 0 | kitesurf's 0.60 floor | | `is_water_sport` | boolean NOT NULL | enables offshore gate | | `rubric_version` | text NOT NULL | | | `sort_order` | smallint NOT NULL | UI ordering | ### 3.11 `moderation_reports` | Column | Type | Notes | |---|---|---| | `id` | uuid PK | | | `reporter_id` | uuid FK → users SET NULL | | | `target_type` | `report_target` enum | `spot` \| `review` \| `photo` | | `target_id` | uuid NOT NULL | polymorphic; integrity enforced in application layer (documented trade-off) | | `reason` | `report_reason` enum | `inaccurate` \| `unsafe` \| `spam` \| `inappropriate` \| `duplicate` \| `other` | | `details` | text ≤ 1000 | | | `status` | `report_status` enum | `open` \| `resolved` \| `dismissed` | | `resolved_by` | uuid FK → users | | | `created_at` / `resolved_at` | timestamptz | | ### 3.12 `auth_tokens` Magic-link / refresh tokens. `token_hash` (sha256, never the raw token), `purpose` enum (`magic_link` | `refresh`), `user_id` FK CASCADE, `expires_at`, `used_at`. Index on `(token_hash)`. --- ## 4. Key Query Patterns the Schema Must Serve | Query | Mechanism | |---|---| | Spots in viewport bbox | `ST_Intersects(location, ST_MakeEnvelope(...)::geography)` via GiST | | Spots near me, by distance | `ORDER BY location <-> point` KNN on GiST index | | Spot detail + sectors + obstacles + photos | 4 indexed single-table reads (or one query with lateral joins) | | Forecast for spot | snap coords → `forecast_cache` lookup → upstream fetch on miss | | Score per hour | computed in app layer from cache payload + spot rows; **never stored per hour** (cheap to compute, expensive to invalidate) | | Search by name | trigram GIN on `spots.name` | | Calibration export | `session_reports JOIN scoring_profiles`, indexed | ## 5. Why forecast keys are grid-snapped, not spot FKs Two spots 800 m apart get identical Open-Meteo answers (its models are ≥ 1 km resolution). Keying cache rows by spot would multiply upstream calls and storage by spot density. Snapping to 0.05° collapses a whole beach town into one cache row. The snap function lives in SQL (`db/schema.sql`, `snap_grid()`) so any consumer language agrees on keys. ## 6. Migrations & Lifecycle - `db/schema.sql` is the **v1 baseline**; from milestone 2 onward, changes ship as ordered migration files (`db/migrations/NNNN_description.sql`) recorded in `schema_migrations`. No down-migrations are required; restores come from backups. - A scheduled job (architecture §6) deletes expired `forecast_cache` rows and `auth_tokens` past expiry. - User deletion: `users` row deleted → reviews/favorites/session reports cascade (personal data), spots/photos keep content with `NULL` attribution. Matches GDPR erasure expectations while preserving community value.